Some observations on penis-size and the Intelligence and Personality testing Industry

Some observations on penis-size and the Intelligence and Personality testing Industry

Why are psychometric tests (including aptitude questionnaires and personality profiling) still in such wide use in recruitment?

Are they really as reliable and valid as their statistical underpinning implies? Or do psychometric tests represent a lazy and convenient way for recruiters to justify the huge sums of money spent on this form of ‘scientific’ assessment?

Psychometric testing is now used by over 80% of the Fortune 500 companies in the USA and by over 75% of the Times Top 100 companies in the UK. Information technology companies, financial institutions, management consultancies, local authorities, the civil service, police forces, fire services and the armed forces all currently make extensive use of psychometric testing^[1].

The core users of psychometric testing are still to be found within the business-heartlands of white, male-dominated, Anglo-Saxon culture, namely the US and UK. However, as business becomes more globalized, this part of organisational sub-culture is spreading. Businesses outside the ‘heartland’ want to join the ‘in-group ‘as recently exemplified by Chinese Banks and Indian call-centres. Professional bodies such as CIPD claim that psychometric tests can reduce recruitment costs by up to 40% and offer bottom-line value to businesses because those recruited ‘fit-in’ to the company culture more readily and take-on brand-values more sincerely, leading to a deeper engagement.

Psychometric testing is BIG business. The worldwide spend on testing in 2006 was estimated at between £1bn and £1.5bn. Many test publishers, consultants and highly paid HR practitioners have come to rely on what has now become a mechanistic process of “objective testing, normed scores and the non-controversial feedback of results to candidates”. This closely controlled and scripted process is regarded as ‘scientific’ and is therefore perceived as requiring specialist expertise (and training) to administer the different tests, score the responses correctly and interpret the results[2].

The process between the tester and the candidate(s) has now also acquired an aura of the typical academic examination-ritual; hushed tones, no phones and “You may begin!” instructions. These standardised procedures are supposed to add to the ‘fairness’ and ‘objectivity’ of the tests.

The individual’s results are eventually compared with a distribution of results previously obtained from other similar test-takers so that feedback can take into account the ‘normed’ score drawn from a similar population.

The psychometric-tests themselves have usually been developed using a rigorous process. Each item (question) on the test will have been checked for face validity, facility, discrimination value etc. The individual scales within the test-instrument will also have been checked for internal reliability, and the whole questionnaire will need to show reliability over time.

Add to this, the gradual accumulation of data (with each use of the particular test) that supports the validity of the questionnaire (i.e. it is measuring what it is supposed to be measuring) and we have strong arguments for supporting the continued use of psychometric testing in the selection and development areas of organisations.

This hegemony enjoyed by psychometric testing and its enthusiastic support from professional bodies such as the CIPD and BPS[3] therefore seems justified. However, here I will be arguing that the current use of psychometric tests is in the main a complete waste of time and money. I believe that the tests themselves are unreliable; that their ‘psychometric properties’ are restricted to a narrow statistical indulgence with a questionable theoretical grounding; that their ‘validity’ is based upon erroneous assumptions about what they are purporting to be measuring; and that the ‘objective’ testing process is for the most part inappropriate, insulting, biased and unethical.

There are two general categories of psychometric tests. One category concerns cognitive ability or thinking power; these are now more commonly presented as “Aptitude”, “Ability” or “Reasoning” tests. However, their roots lie in the historic 11+ examination, and are modern versions of the Intelligence-Test or IQ-Test.

The second category includes Personality or Occupational Questionnaires. These explore the way in which individuals do things; how they behave in certain circumstances: and their preferences and attitudes.

Aptitude and Reasoning tests

These tests involve verbal reasoning, numerical skills, verbal comprehension and grammar, spatial reasoning and manipulation, information processing, and problem solving. Anyone old enough to have sat an 11+ examination in their early schooldays, will recognise this cluster of questionnaires as strongly related to the traditional IQ test. Of course IQ tests have attracted much criticism in terms of their validity.[4]

Apart from the pseudo-IQ tests, there are over 5000 published questionnaires for measuring different aspects of aptitude and ability. These ‘off-the-shelf’ tests often have a weak internal reliability and are not specifically tailored to particular jobs. It’s like going to a supermarket and asking for a specific aptitude test in ‘clerical administration’ and being given the ‘one size fits all’ supervisors test. You’ll get some sort of test result but it won’t measure what you want. Therefore, the link between test results and the competency-profile being assessed for a particular job is quite tenuous.

In many cases, the tests are administrated and marked by the recruiters themselves. Often, these HR and CIPD professionals have little or no training in the use psychological tests and the interpretation of the results. This process is akin to sitting a university examination and having the Faculty Secretary grade your paper, rather than a qualified lecturer.

And, by the way, unlike a university examination, candidates are not allowed to keep a copy of the test paper afterwards. Is this the sign of an objective, transparent and accountable process?

Use of psychometric tests to measure ‘cultural fit’

The current HR buzz words in organisations are concerned with ‘organisational culture’ and subsequent employee ‘engagement’. Results from psychometric tests are used to explore the values/personality of candidates with a view to hiring ‘the right sort of people’; those who will stick to the company line and suit the ‘brand-image’ of the business.

This wished-for ‘cultural fit’ has the tendency to homogenise a business’ work-force where everyone gets along with each other, are pleasantly in agreement with how problems are identified and defined, and can present a uniform and implacable face to clients and customers. Such organisations have ‘policies’ in place for dealing with every situation, often accompanied by a standard script for dealing with external enquires or complaints. Customer experience (CRM) is dealt with at a ‘BIG DATA’ level with executive decisions based upon soulless statistical feedback via such insightful instruments as ‘Dashboards’ and average NPS-scores.

Such businesses are populated with employees who will not ask awkward questions because they have been recruited via a sifting-process that includes a battery of psychometrics to ‘fit-in’ and not ‘rock-the-boat’. Such organisations have similar characteristic to the tests themselves; they are bland, predictable, and lack any spark of curiosity or creativity.

Unfortunately, at present, nobody is allowed to question this psychometrically-imposed orthodoxy. Even the suggestion that psychometric testing may be less than effective is regarded as a blasphemy against the current ‘scientific’ dogma of professional bodies such as the CIPD and BPS; questioning any part of the psychometric-testing edifice is regarded as a grave heresy!

What if you do not ‘fit-in’?

When considering ‘cultural-fit’, there is a more serious problem to be addressed by HR departments of organisations. This problem involves the paradox of wanting a “strong (homogenous) organisational culture”, where everyone has similar values, attitudes and behaviours and simultaneously needing to observe the legal requirements and company policy regarding work-place diversity.

Not only is there a strong “like-me-like-you” bias in the recruitment and selection process itself[5], but also let us not lose sight of the fact that over 95% of current psychometric tests are developed by white Anglo-Saxons, and usually of the male, middle-aged variety.

Compare these items, which appear in various IQ tests[6]:

Why does the state require people to get a license in order to get married?
Three of the following items may be classified with salt-water crocodile. Which are they?Marine turtle brolga frilled lizard black snake (circle your answers)

Suppose your brother in his mid-forties dies unexpectedly. Would you attribute his death to (circle your answer):
a. God b. Fate c. Germs D. No-one e. someone f. Your brother himself
Many people say that “Juneteenth” (June 19) should be made a legal holiday because this was the day when:(a) the slaves were freed in the USA, (b) the slaves were freed in Texas, (c) the slaves were freed in Jamaica, (d) the slaves were freed in California, (e) Martin Luther King was born, (f) Booker T. Washington died.

A “handkerchief head” is:(a) a cool cat, (b) a porter, (c) an Uncle Tom, (d) a hoddi, (e) a preacher.

Which of the following can be arranged into a 5-letter English word?
1. H R G S T
2. R I L S A
3. T O O M T
4. W Q R G S

Q1 & Q2 are taken from an Anglo-Australian IQ test; Q3 is from a test designed to be used with individuals from the indigenous Australian population; Q4 is from an IQ test with the acronym B.I.T.C.H., the “Black Intelligence Test of Cultural Homogeneity” [7]created by Robert Williams in the 1970’s. The focus is also on Black-Americans in Q5, which is drawn from the “Chitling Test”[8], which attempts to balance the vocabulary test items by including references to Black ‘street-culture’. Q6 is taken from an Anglo-Saxon version of an IQ-test.

Which of the items are culturally fair?

The answer is that they are all biased towards a particular sub-culture and are ‘fair’ only if being used to test individuals drawn from that sub-culture and compared to norms based upon that sub-culture. It follows that since the majority of published psychometric tests have their origins rooted in Anglo-(White) American cultures, individual IQ test scores are only usefully comparable within that particular cultural bubble. Individuals who do not come from this particular culture are at a distinct disadvantage when taking the extant versions of so-called ‘culturally-fair’ intelligence or aptitude tests.

At fairly regular intervals, reviews are published which summarise research conducted into the test scores achieved from different populations. When such analyses are ‘popularised’ in style, their impact can be dramatic, controversial and widespread. One such publication was “The Bell Curve”[9], which used data from the US National Longitudinal Survey of Youth from over 11,000 young Americans. The authors’ findings showed the following average IQ scores for the different racial sub-groups:

	African-Americans (Black)	Latino-Americans	Caucasian-Americans (White)	Asian-Americans	Jewish-Americans
Mean IQ-score	85	89	103	106	113

From Herrnstein & Murray, 1996, pp. 273–278)

The 15-point difference between the mean IQ-scores of Black and White-Americans is a consistent finding across many studies, and equates to around 1.1 standard deviations. For example, a meta-analysis by Roth et al (2001) showed that this Black-White contrast could be found in data from the Scholastic Aptitude Test (2.4 million SAT-applications for college and university entry), as well as tests completed for entrance into US military service (0.4 million), and the analysis of over 0.5 million job applications in the context of US corporate recruitment.

These test scores are the best predictor of economic success in Western society (Schmidt & Hunter, 1998) and these group differences have important societal outcomes (R. A. Gordon, 1997; Gottfredson, 1997). They support the contention by some[10] that Black Americans generally do less well in socio-economic life than White Americans because of an inherent ‘lack of intelligence’, which is genetically (racially)-based. The difference in socio-economic attainment between Black and White Americans is therefore explained and ‘justified’ in terms of immutable genetics, rather than it being the result of an inherent racist society where Afro-Americans are discriminated against (either overtly or implicitly) because of ‘looking-different’.

An important point to make here is that in everyone there exists an automatic powerful evolutionary impulse to “protect and nourish their own”. This motivation is seen most strongly at family level, but is equally recognisable as an important driver of behaviour at the group, community, country and country levels. There is a strong tendency in all group-behaviour to develop an “Us and Them” attitude. The individuals, who look alike, think alike, have similar attitudes and follow implicit group norms of behaviour will belong to the ‘IN’ group. Those who are somehow ‘different’ and do not conform are labelled non-conformists or heretics, and definitely belong to the ‘OUT’ group.

This division between ‘us and them’ is seen in the earliest writings of the ancient Greeks[11]. Their Manichaean view was that of a polarized society. The ‘Us’ part were the adult male citizens who wrote all the texts (tests?[12]), and the ‘Them’ or ‘Others’ who consisted of non-citizens, slaves, women and barbarians who didn’t. The present dogma surrounding psychometric testing processes and procedures as set-out by professional bodies such as CIPD and the BPS still implicitly reflect this elitist attitude. The ‘US-group’ designs the tests; administer and score them and then decide whether candidates from the ‘Them-group’ have achieved the ‘In-Group’ criteria.

The not-yet anointed, ignorant barbarians, women, and those of different skin-colour take the tests in the hope of achieving recognition that they have the potential of becoming a member of the in-group and being selected for an induction or re-training programme:

Oh, oobee doo

I wanna be like you

I wanna walk like you

Talk like you, too

You’ll see it’s true

An ape like me

Can learn to be ‘in-group’ too

(From Walt Disney’s 1967 film The Jungle Book)

In terms of organisations and businesses, Occupational Psychologists and HR professionals glibly talk of developing a ‘strong’ Organisational Culture and recruiting and developing individuals who will ‘fit-in’. This is stating explicitly that they are looking to select individuals who will be able to seamlessly join the existing ‘in-group’, with behaviour that follows existing group-norms and supports the existing attitudes and values of ‘in-group’ members.

It follows that the recruitment and selection procedures of IBM are looking for candidates, who can readily become ‘IBM-people’, Apple recruiters are looking for potential ‘Apple-people’, Marks & Spencer are looking to select future ‘M & S’ people and so-on.

The cultural bias that exits in psychometric tests implicitly reinforces this process of homogenising the ‘In-group’ and rejecting candidates who are ‘different’ and are therefore consigned to the ‘out-group’. The middle-aged male, Anglo-Saxon psychometric tests are used to filter recruits for organisations that are dominated by middle-aged male, Anglo-Saxons. Small wonder then, that most Western businesses have problems justifying the skew in the numbers of females and non-white employees. These so-called ‘minorities’ are generally less successful in terms of position in the management hierarchy and in the salary being earned. It is not an accident that this happens, but it is the result of a systematic bias in the selection and recruitment procedures being used. Organisations such as the BPS and CIPD are knowingly supporting this discrimination by refusing to question the extant psychometric-edifice and insisting that the tests used are “objective”, “scientific”, “reliable” and “valid”.

The majority of ‘official’, validated psychometric tests represent a ‘traditional’ but limited and narrow notion of ‘intelligence’ and ‘personality’. As argued in previous sections, the large majority of existing off-the-peg questionnaires are based on the values, attitudes and motivations of a singular ‘in-group’ (e.g. white, Anglo-Saxon, educated, male & middle-aged). This group has a particular weltanschauung and moral compass.

These test batteries will sieve applicants and allow the earmarking of candidates with similar psychographic characteristics to those of the test designers and administrators. These anointed applicants are then offered positions in the white, Anglo-Saxon-dominated hierarchical systems of our current organisations. The process is tautological, unremarkable, uninspiring and implicitly discriminatory.

New recruits are de facto ‘replacement’ rather than ‘refreshment’. And to be sure that no spark of a challenge to the status quo exists within the new recruits, there is always cultural-reinforcement available through induction programmes and mentoring schemes.

Just what are psychometric tests measuring?

Perhaps we can use an analogy here, drawn from the world of Art, to illustrate part of the pointlessness of psychometric tests. Imagine two paintings side-by-side, a Picasso and a Turner and we are asked to determine which is the more beautiful. How would we set about such a task?

Could we determine this by measuring the length of brush-stroke, the thickness of paint, strength of refracted light etc. and then somehow compare the two measurements?

It is madness even to contemplate this because we know that ‘beauty’ is a subjective judgement; after all “beauty is in-the-eye-of-the-beholder”. ‘Beauty’ is not a physical entity that can be measured with instruments designed to calibrate physical units. ‘Beauty’ is a psychological concept that exists ONLY in the minds of others (or if you are a narcissist, in your own mind).

In a similar way, concepts such as ‘Intelligence’ and ‘Personality ‘exist only in the mind as psychological constructs; they are not physical entities or properties that can be measured and calibrated by scales and statistics; instruments more suitable for measuring phenomena in the physical world.

‘Beauty’, ‘Personality’ and ‘Intelligence’ also share the same property of ‘emergence’[13]. They are concepts which are difficult to define objectively and cannot be observed as being contained in their individual constituent parts either of a painting or a human being e.g. the ‘beauty’ of a painting is not contained in the chromatic value of the pigments used by the artist. Similarly, ‘intelligence’ or ‘personality’ is not contained in the physiological arrangement of neurons and synapses in an individual’s brain.

An ‘emergent’ property is both original and radical and shows some property of ‘wholeness’ not observable in the individual elements of the system. The dynamic interaction of the parts of the system produce a ‘holistic’ property that has a coherence over time and is perceived as ‘Beauty’, ‘Intelligence’, ‘Personality’ etc. The worlds of Art and Psychology are only understandable through these ‘emergent’ properties.

‘Intelligence ‘and ‘Personality’ and other psychological emergent properties are not entities that can be measured on a linear scale; even if the scales are psychometrically robust[14]. They are also examples of STRONG EMERGENCE, whereby their qualities cannot be simulated by computer programs. Weak emergence is a type of emergence in which the emergent property is amenable to computer simulation e.g. computer modelling of cognitive processes[15]. Human intelligence has so far proved resistant to computer modelling, because the brain is a parallel processor and computer simulations of thinking are relying on very fast iterations of linear processing. Similarly, I have yet to meet a computer with a personality, although I do know some individuals with the personality of a computer-screen.

The emergent characteristics of psychological complex-systems are essentially subjective qualities (e.g. Crutchfield, 1994). It is the observer who determines what pattern of order is discernable in the non-linear complexity or random chaos of the system being observed in a particular environment. The identification and definition of the pattern, subtly, but critically depends on the observer’s a priori mental model of the emergent property e.g. ‘Intelligence’ or ‘Personality’. Observers will have implicit prototypes in mind of what ‘Intelligence’ or ‘Personality’ consist of, and there will be strong associations with the traditional psychometric methods for measuring these hypothetical constructs e.g.

Figure 1: An individual’s possible association-map for ‘Intelligence’

The association-map set-out above is hypothetical. It represents a possible arrangement of associations surrounding the concept ‘IQ’ that an individual might have. This particular pattern of associations would be idiosyncratic and would give the individual a working definition of ‘IQ’ and its meaning.

The pattern of associations functionally mimics the neural network within the brain and reflects how our associative or long-term memory works. It is also known as our semantic memory.

The green arrows are vectors and demonstrate the strength of association of each word or concept has with ‘IQ’. The closer the word/concept to the target ‘IQ’, the stronger the association, and the easier it is recalled.

In this example, the individual would immediately ‘call-to-mind’ the words “Brainy” and “Well-educated” if asked to respond to the stimulus ‘IQ’. These would be quick responses; they might even be immediate and automatic, recalled without any mental effort or thought. This decision-making pathway is used when we are thinking fast, and has been labelled by Kahneman SYSTEM 1 thinking[16]. It is decision making made on a ‘gut-feeling’, using intuition and no conscious deliberation; it is a subconscious process actioned without conscious awareness.

The other concepts included in the map, have weaker associations, and are recalled only after some time given over to more conscious thinking and evaluation. This decision-making pathway is used when we are thinking slowly. Kahneman labelled this route SYSTEM 2 thinking. This route is used when we are consciously thinking about a decision and considering different alternatives. It is a conscious process, which we are aware of.

System 2 decisions may only take between 1-2 seconds, but this is relatively SLOW when compared with System 1 decisions, which can be automatic and immediate, and being implemented in less than 1 second. These are the FAST decisions.

Each individual will carry a unique association-map of ‘IQ’ and ‘Intelligence’ in their minds. It follows that evaluating someone’s intelligence can only be done sensibly by asking others, “How intelligent do you think X is?” This will give us an average subjective evaluation of an individual’s intelligence in whichever context/environment the research pertains to; a sort of subjective-consensus that for practical purposes becomes an ‘objective’ measure[17].

The current menu of psychometric tests have been designed and developed from the restricted perceptual-maps of mainly middle-aged, white, Anglo-Saxon males. Their consensus of what ‘intelligence’ and ‘IQ’ are is only valid in so far that the tests are being used to select individuals for organisational cultures, the majority of which are heavily influenced by the values, attitudes and behavioural norms of middle-aged, White, Anglo-Saxon males.

On a more general note, the validity of the majority of current psychometric tests can be called into question because of the mismatch between the technical capabilities of the test and the nature of the concepts they are purportedly measuring. Psychological constructs such as ‘Intelligence’ and ‘Personality’ are hypothetical concepts (existing exclusively in the psychological domain) which assimilate the properties of both complexity and emergence. Therefore, it is difficult to imagine that a self-completion, one-dimensional, linear-scaled psychometric test is an appropriate instrument to measure such hypothetical and multi-dimensional concepts.

We should legitimately question the validity of both IQ and Personality tests in their present format. The tests may have robust psychometric properties, but that robustness belongs to the enclosed and restricted world of statistics. The ‘objective’, mathematical scores they produce are not an ‘ecologically-valid’ measure of intelligence or personality. Test publishers, administrators and users have a misplaced confidence in the test results because they are in the form of unequivocal numbers, which can be consistently interpreted following the instructions in the test-handbook.

The problem is that the objective score is not a measure of the true nature of intelligence or personality, but instead refers to an artificially narrow statistical abstraction created as a by-product of factor analysis during the test development process. IQ tests measure IQ, not intelligence; IQ only represents a particular part (e.g. mathematical and verbal ability) of what human intelligence consists of. Similarly, personality inventories measure only part of what makes up the personality of an individual.

In the context of the HR functions of recruitment and selection, this does not matter, because organizations are only seeking a certain limited range of abilities and attributes from their potential recruits. Generally speaking, businesses want mathematical and verbal skills, it is unimportant whether candidates can cook, enjoy art and play a musical instrument etc. Organizations want employees who will work-hard, be ’engaged’, accept the status quo and fit-in with their brand image at all times.

The majority of current IQ-tests are based upon the idea that there is a core-factor of intelligence, usually labelled as ‘g’, first proposed by Charles Spearman[18]as a measure of general mental ability or general intelligence factor. The IQ test was originally designed by Alfred Binet in the early 1900’s as a diagnostic tool to measure scholastic performance of French schoolchildren. The test has since gone through various development cycles by researches on both sides of the Atlantic.

The IQ (or Intelligence Quotient) is a score obtained by dividing a person’s mental age score, derived from administering an IQ-test, by the person’s chronological age, both expressed in terms of years and months. The resulting fraction is multiplied by 100 to obtain the IQ score.

The median raw score for norming is 100 with a 15 point standard deviation. Statistically therefore, approximately 66% of population scores lie between an IQ-score of 85 and 115. Roughly 5% of the population scores 125 and above, and 5% scores 75 and below.

Intelligence Levels

Score: 70+ “Normal”/130+ “Gifted”/Below 70: “Retarded”

51-70: “Morons”/26-50: “Imbeciles”/0-25: “Idiots”

This classification system was replaced in the 1970’s by “Very Superior” (130+);”Superior (120+); High Average (110-119); Average (90-109); Low Average (80-89); Borderline (70-79) and Extremely Low (<+69)

Image from: Bruinius, H., (2006, 2007), “Better for All the World: The Secret History of Forced Sterilization and America’s Quest for Racial Purity”.

IQ-scores correlate positively with performances at school and college (and also SAT scores and other aptitude tests) but their predictive power (lower correlations) reduces for performance at University, especially at post-graduate level. Generally, IQ scores are not a good predictor of success in work, although some researchers have attempted to come up with an ‘estimated’ relationship between scores and job-type e.g.

WAIS-R Mean IQ Range	Occupational Category
110-112	Professional and technical
103-104	Managers, clerical, sales
100-102	Skilled workers
92-94	Semiskilled workers
87-89	Unskilled workers

From: Kaufman, A. S. (1990), Assessing Adolescent and Adult Intelligence. Allyn and Bacon

There is a weak positive correlation between IQ-scores and income, but when wealth and IQ are compared, the relationship is purely random. This suggests that there is some meritocracy in the distribution of income[19], but not so much within the population who owns yachts, has off-shore investments, avoids paying tax or squirrels-away the company’s pension funds[20].

The higher correlations between IQ-scores and aptitude-tests such as SAT and achievement at school should not be surprising because all three are evaluating similar dimensions of cognitive ability, namely, a selection from: basic mathematics ability; verbal ability (English grammar & vocabulary, synonyms & antonyms); reasoning ability (analogies); spatial reasoning and image rotation. These individual dimensions load about 50-60% onto the hypothetical ‘g’.

Stephen Gould (1981)[21] has criticised the whole edifice of IQ testing and particularly disputes the very existence of ‘g’, arguing that it is statistical artefact from the procedure of factor-analysis which is used in the development and construction of IQ-tests. He feels that supporters of ‘g’ and its supposed genetic origins are guilty of “scientific racism”;

“ ….. the abstraction of intelligence as a single entity, its location within the brain, its quantification as one number for each individual, and the use of these numbers to rank people in a single series of worthiness, invariably to find that oppressed and disadvantaged groups—races, classes, or sexes—are innately inferior and deserve their status”. Stephen Gould, (1996).

Gould’s criticism was aimed at the work of psychologists such as Arthur Jensen and Philippe Rushton who strongly argued for a racially-based difference in intelligence (based on IQ scores). Both Jensen and Ruston were supporters of the findings published by Richard Herrnstein and Charles Murray in “The Bell Curve” (1994).

This book used the results of previous empirical studies using IQ-measurements to support the argument that genetic influences were mainly responsible for differences in test scores between White (Caucasoid), Asian (Mongaloid) and Black African (Negroid) Americans.

The authors also made the connection between low test scores and anti-social behaviour. Indeed Rushton has gone further with comparisons between the races, claiming genetic influence on a wide range of physiological and sociological differences[22]; see table below.

Worldwide Average Differences among Blacks, Whites, and East Asians

Trait	Blacks	Whites	East Asians
IQ Test Scores	85	102	106
Decision Time	Slow	Medium	Fast
Cultural Achievements	Low	High	High
Brain-size	1267 cm3	1347 cm3	1364 cm3
No. of neurons	13,185 m	13,665 m	13,767 m
Age of first intercourse	Early	Intermediate	Late
Age of first pregnancy	Early	Intermediate	Late
Aggressiveness	High	Medium	Low
Cautiousness	Low	Medium	High
Impulsivity	High	Medium	Low
Size of genitalia[23] [24]	Large	Medium	Small
Intercourse Frequency	High	Medium	Low
Probability of permissive attitudes	High	Medium	Low
Incidence of sexually transmitted diseases	High	Medium	Low
Marital Stability[25]	Low	Intermediate	High
Law abidingness	Low	Intermediate	High
Mental Health	Low	Medium	High

From: Rushton, J. P. and Jensen, A. R, (2005), Thirty Years of Research on Race Differences in Cognitive Ability[26], Psychology, Public Policy and Law, Vol. 11, No.2 pp. 235-294

Modern thinking about human intelligence rejects the idea that it can be characterised as a one-dimensional ‘catch-all’ general cognitive problem-solving capacity. Current IQ tests combine the scores of various sub-scales to produce a single figure. This approach is fundamentally flawed as it is ignoring the work done by researchers such as Gardner (1983)^[27] and Sternberg (1985)^[28] who along with others developed the theory of multiple intelligences. This theory contains strong criticism of the traditional notion of intelligence, ‘g’, which IQ tests are purporting to measure, as being too restricted. For instance, Van Gemert describes IQ-scores as being no more than an indication of how well Westerners might do in Western schools. Whilst, Richard Nisbett, Professor of Psychology at the University of Michigan points out some of the omissions in current IQ-tests:

“IQ scores do not tell you anything about practical intelligence, your ability to fathom how things work. It doesn’t measure your curiosity. It doesn’t measure your creativity.”

The most well-known formal theory of multiple intelligences was developed in 1983 by Dr. Howard Gardner, professor of education at Harvard University. The theory proposes eight different intelligences to account for a broader range of human potential in children and adults.

These intelligences are:

Linguistic intelligence (“word-smart”)
Logical-mathematical intelligence (“number/reasoning-smart”)
Spatial intelligence (“picture-smart”)
Bodily-Kinesthetic intelligence (“body-smart”)
Musical intelligence (“music smart”)
Interpersonal intelligence (“people-smart”)
Intrapersonal intelligence (“self-smart”)
Naturalist intelligence (“nature-smart”)

“We are all able to know the world through language, logical-mathematical analysis, spatial representation, musical thinking, and the use of the body to solve problems. We also make things. We also require an understanding of other individuals and an understanding of ourselves as well as how we fit in with the rest of nature. Individuals differ is in the strength of these intelligences, but we all possess this profile of intelligences, which we use in combinations to complete different tasks, solve diverse problems and be successful in various environments.” Dr. Howard Gardner (1983).

Surely, even the strongest supporters of traditional psychometric tests will admit that this ‘profile of intelligences’ is a more valid description of human potential than what the ‘dry and colourless’ IQ-test is measuring. Gardner and research colleagues are readily demonstrating the limitations of the content of the current IQ-tests, and hence their validity.

Research into multiple intelligences is clearly showing the traditional concept of IQ to be narrow and selective. Therefore the subsequent claims to be measuring ‘intelligence’ from such a partial and biased foundation is unrealistic and also damaging for individual participants and the professionals involved.

Other researchers have examined the importance of Interpersonal & Intrapersonal Intelligence, particularly in workplace settings. It is often assumed that there must be a strong relationship between IQ and leadership in organisations. However, research shows that the correlation, although positive, is not all that strong. In the workplace the narrow ‘academic’ IQ measure does not predict success in a practical sense. Leadership requires the ability of dealing with people successfully and researchers have looked elsewhere for different forms of intelligence that might be equally or more important in organisational environments.

Goleman (1998)[29] was the first to popularize the concept of Emotional Intelligence, which is a measure of interpersonal and intrapersonal intelligence. The ‘intelligence’ is based upon the individual’s ability to recognize their own emotions and those being experienced by others, deal with them, display appropriate behaviour, and build sound relationships. Researchers in this area have shown that it is lack of emotional intelligence that often derails leaders who alienate workers and colleagues with dysfunctional behaviour e.g. bullying, temper tantrums, micro-managing, and unwanted flirting[30].

Psychometric Personality Testing

Again, despite the much trumpeted robust qualities of the majority of published psychometric tests, the numbers representing an ‘objective’ measure of an individual’s personality just do not add-up. At best, the response to the results of personality questionnaires are contaminated with the Barnum effect[31], and have been shown to have the predictive validity[32] of a “Mystic Meg” tale –of-fortune.

Here is an example of a ‘Barnum’- inspired questionnaire designed by Forer[33]:

You have a great need for other people to like and admire you.
You have a tendency to be critical of yourself.
You have a great deal of unused capacity which you have not turned to your advantage.
While you have some personality weaknesses, you are generally able to compensate for them.
Disciplined and self-controlled outside, sometimes you tend to worry on the inside.
At times you have serious doubts as to whether you have made the right decision or done the right thing.
You prefer a certain amount of change and variety and become dissatisfied when hemmed in by restrictions and limitations.
You pride yourself as an independent thinker and do not accept others’ statements without satisfactory proof.
You have found it unwise to be too frank in revealing yourself to others.
At times you are extroverted, affable, and sociable, while at other times you are introverted, wary, reserved.
Some of your aspirations tend to be pretty unrealistic.
Security is one of your major goals in life.

Participants responding to the (same) feedback purporting to have come from their answers to this questionnaire, on average rated the accuracy of the test to be 4.26 on a 0-5 scale, where 5 was extremely accurate.

Personality tests tend to have items that have more than just a tinge of a ‘self-fulfilling prophecy’ about them and the whole exercise seems somewhat tautological, as the quest for internal reliability means that multiple items are used to ask the same question in a different form of words, and the resulting feedback mimics this repetition of interchangeable meanings e.g.

I like being the centre of attention
I enjoy being in others’ company
I am something of a party-animal
I like talking a lot

If you agreed (strongly) with these 4 statements, the score and feedback from the test would be that you were an extravert or had “a strong tendency towards extraversion”. This amounts to a form of pseudo-science, whereby 4 separate scores are being averaged ‘scientifically’ to produce an evaluation that could be more parsimoniously achieved by asking the participant a direct question e.g. “On a scale 0-5, where 0 = not at all and 5 = 100%, how much of an extravert do you consider yourself to be?”

There are some aspects of personality that cannot be asked directly because of a likely social desirability bias e.g. few people would admit to being ‘neurotic’. However, many of the items contained in personality tests look ‘obvious’.

This leads on to another point about personality testing: how honest are the responses being elicited? Although there are attempts at including lie-scales and other checks and balances, it is difficult to spot if someone is ‘second-guessing’ the answers in order to disguise his/her personality and provide a profile which they feel will ‘fit-in’ with what the prospective employer is looking for.

Organizations also exist which offer coaching sessions to improve candidates’ psychometric test-results. Prospective candidates are coached to respond to aptitude and personality tests in particular ways. Improvement in scores can be achieved just by repeating the test-experience.

If individuals can be trained to achieve enhanced scoring, how valid and the reliable can the tests be, even with robust statistical properties?

More recently, there has been a spate of business scandals claiming the headlines in the popular media e.g. the Volkswagen emissions cover-up or the banks rigging of the Libor rate. This has led to calls from large investment companies and politicians for some changes in the make-up of boardrooms and tighter control of the Governance of companies.

This has quickly led to the test-industry producing questionnaires purporting to identify levels of integrity, honesty, impulsivity, stress tolerance and conscientiousness in employees. These new personality inventories are designed to evaluate the likelihood of individual managers or employees engaging in unethical and/or illegal behaviour at work and therefore reduce the incidence of future organizational misbehavior and fraud.

But hang-on a minute! In our introduction, I outlined the size of the current psychometric-testing industry; it is BIG! We also know that psychometric testing in its various guises has been used in earnest for selection and recruitment since the 1950’s. If the tests are so accurate at evaluating prospective candidates, for intelligence, aptitude, personality and now honesty etc., surely after nearly 70 years of measuring and selecting, we should expect that organizations would be near perfect. They should be full of intelligent, personable people, who work well together ‘as a team’, engaged with a strong corporate culture and sharing in the benefits of a successful and profitable business. What has gone wrong?

Perhaps the psychometric-testing industry should revert to measuring penises?

[1] http://www.psychometric-success.com/psychometric-tests/

[2] In the UK, The British Psychological Society provides a framework for training would-be psychometric test practitioners, known as “Level A and Level B” programs.

[3] CIPD = Chartered Institute of Personnel and Development; BPS = British Psychological Society.

[4] The ethics and validity of IQ testing is discussed separately in later sections.

[5] Anderson and Shackleton, (1990)

[6] See for example: The Redden-Simons “Rap-Test” developed by Redden, P. M. & Simons, J. A. (1986) for individuals from the Chicano culture. This 50-item vocabulary test, part of an IQ battery, represents the ‘street language’ of local youth in Des Moines, Iowa; white, middle-class college students averaged only 2 correct answers against a ‘street-normed score’ of 8.

[7] There is also a version with the acronym W.I.T.C.H., “Women’s Intelligence Test of Cultural Homogeneity” designed to compensate for male bias in the usual IQ test batteries.

[8] Dove, A. (1971), The “Chitling” Test in Psychological and educational testing, Lewis R. Aiken, Jr. (1971), Boston: Allyn and Bacon

[9] Herrnstein, R. J. & Murray, C., (1994). The Bell Curve, Free Press: New York.

[10] For example: Rushton, J. P. & Jensen, A. R. (2005), Thirty Years of Research on Race Differences in Cognitive Ability, Jnl. of Psychology, Public Policy and Law, Vol.11, No. 2, pp. 235-294.

[11] Davidson, J. (1997), Courtesans & Fishcakes: The Consuming Passions of Classical Athens, Fontana Press, Introduction pp. xxv.

[12] Author’s note

[13] Emergence is a characteristic of complex or chaotic systems, whereby the interaction of smaller, simpler elements of the system, gives rise to a ‘supra-property or characteristic that is not identifiable in any of the individual elements. An emergent property is qualitatively different from the smaller, simpler elements and can be summed up in the phrase: “the sum is greater than the parts”. (see Corning, P. A., (2002), The re-Emergence of Emergence: A Venerable Concept in Search of a Theory, Complexity, 7 (6), pp. 18-30

[14] Imagine trying to compare the amount of ‘beauty’ in a Picasso painting with that of one by Constable using a highly reliable and validated meter-ruler!

[15] Computer models provide a necessary discipline in the interpretation and understanding of theoretical processes, and they can function to make psychological theories more falsifiable and more vulnerable to experimental test. However, it is necessary to distinguish between theory-relevant and theory-irrelevant routines in models, and to state clearly where psychophysical evidence has been used to support assumptions about organization of the system; where at some level of neurophysiology, processes similar to those modelled in neural networks should be identifiable. A further problem for computer simulation is the tendency to ignore the influence of important factors such as attention, arousal and motivation. (See: Frijda, N. H. (1967). Problems of computer simulation, Behavioural Science, 12, 59-67 & Dreyfus, H. L. (1972), What computers can’t do: A critique of artificial reason, New York: Harper & Row).

[16] Kahneman, D, (2012), Thinking Fast and Slow, Penguin https://www.penguin.co.uk/books/56314/thinking-fast-and-slow/#xBcjD8gqwdHAOAv7.99

[17] Of course, like the US immigration authorities, we can always ask the individual directly about his/her IQ i.e.“ Are you, or have you ever been intelligent?” (I wonder how the answer would correlate with the individual’s IQ-test score.).

[18] Spearman, C.E, (1904), “‘General intelligence’; Objectively Determined and Measured”, American Journal of Psychology, 15: 201–293

[19] Zagorsky, J (2007), Do you have to be smart to be rich? The impact of IQ on wealth, income and financial distress, Intelligence 35: 489-501.

[20] See 18. ‘Sir’ Phillip Green’s Legacy – a Testament or Trash! , Fix & Fiddle”, Issue 1, Spring 2016, (April-September 2016), Entry 18, pp.9

[21] Gould, S. J., (1981), The Mis-measure of Man, New York: W. W. Norton & Company. See also: Gould, S. J. (1985). “The Median Isn’t the Message”. Discover 6 (June): 40–42; and Gould S. J. (1996). The Mis-measure of Man: Revised and Expanded Edition. New York: W. W. Norton & Co., p. 36

[22] There seems to be a significant negative correlation between IQ-test score and penis-size see table and note 23 below:

[23] According to Rushton average erect penis-sizes are: Orientals 4-5.5”, Caucasians 5.5-6”, Blacks 6.2-8” in Rushton, J. P. (1995). Race, Evolution, and Behavior: A Life History Perspective (2nd special abridged edition). Port Huron, MI: Charles Darwin Research Institute

[24] “IQ is like dick size – if you have to measure it, you’re way too concerned with it. And both are gauche to discuss in polite company.” Adrian Lamo; http://www.goodreads.com/author/show/14917061.Adri_n_Lamo

[25] Genes usually code for bio-physiological characteristics and not behaviour

[26] Interestingly “Size of Genitalia” is included under “Cognitive Ability”!

[27] Gardner, H., Frames of Mind: The Theory of Multiple Intelligences. New York: Basic,1983

[28] Sternberg, R. J. (1985). Beyond IQ: A Triarchic Theory of Intelligence. Cambridge: Cambridge University Press.

[29] Goleman, D. (1998), Working With Emotional Intelligence. New York: Bantam.

[30] Riggio, R.E, Murphy, S.E, & Pirozzolo, F.J, (Eds.), (2002), Multiple Intelligences and Leadership, New York: Taylor & Francis.

[31] The Barnum effect (also known as the Forer effect), is the common psychological phenomenon of accepting mundane and vague personality descriptions as having been generated specifically for the individual. This tendency is reinforced if the ‘Barnum’ is being fed-back from a supposedly highly-accurate personality test by HR professionals or psychologists.

[32] Ghiselli, (1973), The Validity of Aptitude Tests in Personnel Selection, Personnel Psychology, 26, pp. 461-477 – a meta-analysis of predictive validity coefficients for psychometric test-scores and the ‘success’ of individuals in different jobs: Managers 0.22; Clerical 0.22; Sales 0.32; Manual 0.25; Service industry 0.16. In other words, over 90% of a person’s success in a job is due to factors other than psychometric test-scores!

[33] Forer, B.R,(1949), The fallacy of personal validation: A classroom demonstration of gullibility, Journal of Abnormal and Social Psychology, American Psychological Association. 44 (1): 118–123

Leave a Reply Cancel reply