Tuesday, October 1, 2019

Psychometry: Science, pseudoscience and make-belief

Let's take a quick glance at psychometry. Personality tests abound, and they've even invaded organizations' HR departments as a means of determining who "fits" and who doesn't. This, I claim, is something we shouldn't use in agile organizations - as these models are dangerous.
tl;dr:
Be careful what you get yourself into with psychometry. Chances are you're falling for something that could cause a lot of damage. Educate yourself before getting started!
Appealing, yet scientifically dangerous: The "Four Color Personality Types"

A brief history of  Psychometry

I will take a look at the models which survived history and are still around and in use today.

MBTI

In 1917, Katharine Cook Briggs and Isabel Briggs-Myers published a model which we now know as "MBTI", Myers-Briggs Type Indicator. with 4 traits in 2 different shapes each - resulting in 16 personality types.

DISC

In 1928, William Marston was tasked by the US Military to figure out why people with the same training still had different behaviour. The model identifies four key characteristics - D,I,S and C. Oddly enough, while the original model had "Dominance, Inducement, Submission and Compliance", today people can't even seem to agree what the acronym actually abbreviates.
Today, we see terms like Influence, Steadfastness, Conscientuousness as alternate labels - which means that depending on which meaning you assign to a letter, your scores would have a totally different meaning!

The Big Five (OCEAN)

In 1984, psychologists took a renewed interest in psychometry and Goldberg e.a. proposed the "Big Five" factors, Openness, Conscientuousness, Extraversion, Agreeableness and Neuroticism.
OCEAN spawned a few models on their own:

Occupational Personality Questionnaire (OPQ)

Saville and Holdsworth launched this model in 1984, and it's still in use today. This model is specifically focused on selection, development, team building, succession planning and organizational change. It has seen updates and refinements since its inception.

NEO PI-R

Since 1978 Costa and McCrae have developed the "(Revised) NEO Personality Index" which subclassifies the Big Five into six subcategories each. One of the key criticisms of this model is that it only measures a subset of known personality traits and doesn't account for social desirability of traits.

HEXACO

As the Big Five caught global attention, researchers realized that different cultures paid attention to different personality aspects, the Big Five were revisited, specifically due to feedback from Asia. Factors like Humility, Honesty ("H") and Emotionality ("E") have a much higher impact on the social perception of an individual in some cultures than in others, and therefore upon how a person sees themselves, as well.

HEXACO led to the interesting insight that there is no universal standard of measuring personality, as the measure depends on the social environment of the measured individual.
Likewise, HEXACO studies revealed that social acceptability determined desirability of traits, and that even the formulation of questions could yield different results depending on social context.


Scientific perspective

Companies have a keen desire to use a scientific approach in determining "best fits" for a new team member, in order to maximize the success rates of placing a successful candidate.
As ongoing research in the field of psychometry reveals, there is no comprehensive personality model, and therefore, no comprehensive personality test.
A comprehensive personality model would require both a large spectrum of personality traits and the social background.

Model Correctness

For the time being, the only factors that have been found to be universally accepted across cultures are extraversion, agreeableness and conscientousness. Everything else is up to debate. From the other side of the coin, this means that any model without these three dimensions can not be adequate.

Even the validity of the universally accepted factors is up to dispute. For example, Dan Pink stated, "People are ambiverts, neither overly extrovert nor introvert", or in other terms: our environment and current mood determines the expression of our "Extraversion" dimension much more than our internal wiring.

It's also unclear at this time how many factors actually exist, so every model we have focuses on a limited subset, and therefore expresses a current bias.


Valid Modeling

Scientists create, refine and discard models all the time. The goal is to have the best possible model, that is, the simplest valid statement with the highest level of explanatory power. The more widely accepted a model is, the more fame will be accredited to the first person disproving said model, that is: the bigger the crowd of scientists interested in finding flaws.

Counter-evidence

The first question when creating a model would be: Is our model valid? The scientific approach would be to look for evidence that the model is indeed not valid, and the model is assumed to be valid as long as no such evidence can be produced. Note that this neither means our model is good nor that it will remain valid when further information becomes available.

Models which have counter-evidence should not be used.

Explanatory Power

The second question to ponder is: How much does our model explain? There are two common mistakes regarding explanatory power of a model:
The first is the category error, that is - to use the model to explain things which it isn't intended to explain, such as using a model that was designed to explain individual behaviours in an attempt to explain social interactions.
The second mistake would be to use the model outside its precision. For example, a model that already fails to address the cultural differences between Asia and Europe would be inadequate to compare the behaviours between a person from Asia and a European.

Preference goes to the simplest model with the highest level of explanatory power required to address a subject.

Reliable Measurement

To be considered "reliable", a scientifically valid measurement would need to be:

  • Accurate, that is, it should generate outcomes that align with reality.
  • Repeatable, that is, a test under the same preconditions should generate the same outcome.
  • Reproducible, that is, testing the same target in different environments should generate the same outcome.

The lower any of these three attributes is, the less reliable a measurement would be. Reliability of a measurement system = Accuracy * Repeatability * Reproducibility, i.e. the predictive capability of data diminishes rapidly as these factors dwindle.

Measurement systems ("tests") with low reliability should be avoided or improved.



Pseudoscience

Models which lack supporting evidence, have already been debunked, which have low explanatory power or which are based on unreliable metrics are generally considered "pseudoscience".
Statements based on such models would be considered doubtful in the scientific community.

The reason why older models, first and foremost, MBTI and DISC, despite their high (and often re-trending) popularity would be considered pseudoscience, is that they lack explanatory power and reliable measurement.

While some models claim high repeatability, many people have expressed doubts whether personality tests are sufficiently accurate.
Some assessments might even claim that "you have a family profile and a job profile", essentially surrendering reproducibility, and therefore, scientific validity.

As mentioned before, even the very refined HEXACO model suffers from a lack of explanatory power, and depending on how a test is configured for a specific environment, this specific configuration might have little supporting evidence or even generate counter-evidence.

Therefore, it stands to debate how useful psychometry could be to make statements about a person's workplace behaviour.




Make-Belief

The key criticism in regards to most psychometry tests is that a personality report from these models is a kind of a Barnum statement - people who read their report suffer from a Forer effect: Reports generated by random data might be perceived equally accurate as reports made by conscious choice. People look for the attributes that feel describes them "fairly well" and overlook the passages that aren't suitable.

Tests based on MBTI and DISC profiling suffer specifically strong from this - either their statements are so vague that they could describe technically anybody, or people would feel that whatever outcome is attributed to them is not universally applicable, or doesn't suit them at all.

The "explanation" for this vagueness tends to be that factors are fluent and exist in different levels of manifestation, which basically makes a binary classification meaningless.

The effect on people

In a statement on one website, the claim was "The outcome of the test can affect your life", which is indeed true, especially when the test is being used for job selection and you didn't get hired because you didn't show up as what the hiring person was looking for.

Using the models

The only point I give to the models is that test results can be a decent conversation starter with your team, friends or family - although I'll put that point into abyeance, because likewise could be a relevant subject matter or even the weather.


Harmful application

This is where I get into the realm of "coaching". Some coaches peddle certain models as "strongly supported by science", which indeed aren't - and people who lack a scientific background will use these models as if they were.

Especially "The Four Colors", which are pomoted worldwide in management seminars and which are now also finding their way (in one form or another) into Agile Coaching pave the way for dangerous dynamics.

The worst application of the model I have seen are "helper cards" used by people to categorize the other people in the room during a conversation.

Promoting ignorance

There is no simple way to classify a person's behaviour within an sociotechnical system. Every model that claims to have an easy answer that utterly ignores environment is dangerous - because it focuses on the consequence while ignoring the trigger. Without educating people on the impact of environment on behaviour, psychometry becomes a distraction rather than a means of understanding!

Thinking inside the box

People are complex, very complex indeed. As a proverb from Cologne states, "Jede Jeck is anders", roughly translating to: "Every human being is different" - you just can't put people into boxes.
There's also a high probability that behaviours you observe or how you judge those behaviours are tainted by your personal bias. As long as you think of people in such boxes, you're very prone to miss important nuances.

Manipulation tactics

When I was taught DISC a decade ago, I learned that people with a strong "D" dimension respond positively to terms like "Fast" or "Effective", whereas they get put off by details. Same for other dimensions. As such, I have learned to use the DISC model as a means to use language to manipulate people to agree with me.
As helpful as such knowledge can be to make decisions, as deceptive it can be - because this sets up people for manipulation and exploitation. Is this where you want to go in coaching?

Missing the Big Picture

Psychometric models focus on the individuals, ignoring their role in their environment. Strangely enough, my first question when sitting in a DISC training was, "There's this person who's strong in all four dimensions. What's that?" During the training, I just swallowed the anwer, I didn't understand the consequences until years later: "This person is an adaptor. They display the strengths that the current situation requires."
Later, it hit me like a concrete block: People adapt to their environment. Their social role determines which strengths they will exhibit. And as their role changes, their visible profile changes as well.

As such, we can't measure a person at all, we just get a glimpse of where that person currently stands in society. Change that role, and their psychometry changes. And that role changes as circumstances change.

You can change a person's social environment to turn an inspiring leader into a tyrant.
You can change a person's belief system to turn a braggart into a humble person.
You can affect a person's incentives and turn a couch potato into a sportsman.

How much do you then think that a few dozen questions will tell you about what a person could be?

Building the wrong team

Some organizations try to build teams with a "suitable" mix of personalities and ignore that their psychometric data is a poor representation.
Psychometry can be flawed from three angles:
  1. The test itself wasn't an accurate representation of the person's beliefs and behaviours.
  2. The test outcomes were inaccurate to describe the person's beliefs and behaviours.
  3. The test ignored the current social dynamics leading to a person's behaviours.
People's behaviours and dynamics depend on context. Hence, planning based on psychometry makes unsupported assertions about the future state of the team.

How ridiculous would it be to ensure that each team is built with one Red, two Green, two Blue and a Yellow - only later to discover that a Green adapted to that role and is otherwise Red, and that the Yellow was only Yellow back when they were hired?

Making concessions

In some cases, inappropriate use of profiling other people based on observations can be used to "excuse" negative behaviours and unhealthy group dynamics. For example, bullying might be considered the conseuquence of "expressing strong dominance", and the behaviour itself or the systemic enablers might continue unquestioned.
Likewise, people with "strong agreeableness" might accept immoral behaviours, when they should be encouraged to take a stand and fight for change.



Summary

This article explains why many approaches to Psychometry are scientifically invalid, why psychometric data should be treated with caution and why coaches should be utterly careful when meddling with psychometry in their work.

If you use or plan on using psychometry in coaching, be careful of the problems you are inviting.

No comments:

Post a Comment