Notes On the Flynn Effect

Jul 24, 2007 10:21

The Flynn Effect has to be the weirdest finding in psychometrics. If you take it at face value, it suggests that someone in the bottom quintile of intelligence today would be in the top quintile of intelligence at the turn of the 20th century. I have no trouble believing that people were dumber on average back then, but that much dumber? It just beggars belief. I've been trying to make sense of it, and the following is a long post summarizing what I've been able to sort out.

This, I Do Not Believe

Over the course of the 20th century, students have gotten used to being increasingly bombarded by tests where they had to solve a bunch of problems under a time limit. So this could cause increases in intelligence test scores simply due to decreased anxiety and greater test-savvyness, including a greater willingness to go with your best guess quickly rather than take your time. Right? Well (says Flynn), no: The problem with this is that the increase in IQ scores began about three decades *before* time-limited testing started to become ubiquitous in schools. Plus Richard Lynn points to a 1989 study by Flieller, Jautz & Kop (which I have not read) that tried to test this by looking at 40-year test gains in France, and found that the poorer scores from the earlier cohorts are pretty much entirely explained by incorrect answers rather than failure to answer more items.

What about better schooling? If schooling has increased people's knowledge of things like reading and math, this could cause performance on tests that measure these specific abilities to increase, thereby increasing full-scale IQ scores, right? Except the data are almost the opposite of what this idea would predict: Gains in scores that load more heavily on specific knowledge have been pretty small compared to gains in "purer" measures of problem-solving ability-about half an IQ point per decade versus 3-7 IQ points per decade, respectively. Also, this explanation has a hard time dealing with the high heritability of IQ (over 0.7): It implies that IQ score should vary significantly with quality of schooling, but this doesn't seem to be the case based on behavioral-genetic studies of seperated twins adopted by families in varying income brackets and neighborhoods.

Then there's the "school of life" hypothesis, which James Flynn himself seems to favor: That society has changed in a direction that increasingly encourages abstract thinking and complex mental stimulation, whereas previously the cognitive demands were not so great and a person could get by fine with mostly concrete thinking. This sounds plausible, except that it doesn't explain why there have been consistent testing gains shown even in very young children (as young as 4 years old). I can believe this sort of ubiquitous "g exercise" can increase intelligence to some small degree, but it simply can't be the whole story (or even most of it), as will hopefully become clear from the considerations that follow.

The Top Contenders

Having cleared these decks, I've only seen two possible explanations for the Flynn effect that make any sense. The first, and probably the one with the most going for it, is Richard Lynn's hypothesis that it's mostly due to improvements in nutrition. This is motivated mainly by three facts: 1) Nutrition in developed nations vastly improved over the first half of the 20th century. 2) Average heights increased concurrently with nutritive improvements, most dramatically during the first half of the century and then gradually tapering off to a halt by the early '90s. This tracks quite well with the timeline of IQ gains. 3) The scores that have shown the highest Flynn effect gains also tend to be the skills most heavily impaired by nutritive deficiencies. Lynn's thesis has gained some support by recent developments in Kenya, and a study in Spain that found that levels of certain dietary micronutrients correlated with IQ.

The alternative explanation, which raises a few other annoying issues, is simply that tests of cognitive ability aren't measurement invariant across generations and the increasing scores overstate real gains in cognitive ability. The motivation for this argument is that it's hard to believe that the people who went to baseball games 80 years ago were mostly too stupid to understand the rules. Witcherts et al (2004) provide compelling evidence against cross-cohort measurement invariance using factor analysis on five different data sets for multiple cohorts, the gory details of which I'll spare you. The take-home point is that for reasons not entirely clear, the same battery of test items given to two different cohorts will not necessarily accurately measure between-cohort differences in the latent variables these tests accurately measure *within* cohorts. I have no idea why this would be and Wicherts et al don't offer much in the way of speculation, but the finding is hard to argue with. And it does have the merit of reconciling our suspicion that people couldn't have been *that* much less intelligent a century ago.

Real vs. Nominal

Taking the evidence for the Lynn & Wicherts hypotheses together, I think a full explanation of the Flynn effect probably requires both. The Witcherts finding doesn't necessarily imply that there've been no real gains in cognitive ability over the past century; it merely shows that raw IQ scores probably overestimate whatever gains there have been. On the other side, there's a lot of circumstantial evidence supporting Lynn's hypothesis: A number of studies have found that the Flynn effect gains are much more highly concentrated on the left half of the bell curve than the right, which would also be consistent with Lynn's hypothesis since duller people tend to be poorer than smarter people and thus reap a disproportionate benefit from improvements in nutrition (Colom et al [2005], Teasdale & Owen [1989], Lynn & Hampson [1986]).

Significantly, R.L. Jantz (2000, 2001) has published findings that cranial vault size has increased over the past century, concurrent with IQ scores and probably attributable to differences in developmental conditions. Given the modest correlation between IQ and head size (~0.2) it would be pretty remarkable if this increase *didn't* translate to some modest cognitive gains. The head size-IQ correlation is just an attenuated proxy for the brain volume-IQ correlation, which is ~0.4, and Miller & Corsellis (1977) directly recorded a secular increase in brain weight over roughly the same period.

There's one final piece of evidence in favor of the gains being real that fits nicely with all of the above, found by Kane & Oakland (2000). They start with an observation originally made by Charles Spearman that as you go further out along the right tail of the bell curve, the amount of test score variation ascribable to g decreases (logically implying that the covariance ascribable to item-specific factors and noise increases). This is just a special case of the general phenomenon that as the variance of one variable within your sample decreases, the total variance explained by it necessarily decreases. Kane & Oakland extend this to the intergenerational level, pointing out that if population-wide variance in whatever variables influence g were decreasing, we'd expect to see a trend toward g explaining less of the variance on test scores. Mirabile dictu, this is exactly what they find in their analysis of the Weschler family of tests.

So while I have no trouble believing that Flynn effect gains are partly artifactual, I have an extremely hard time believing they're wholly so. My SWAG is that we really have seen a real average increase of about a full standard deviation (i.e. about 15 IQ points), but as usual "more and better data" should be the next move.

Curveballs

I've found one piece of data that slightly undermines part of Lynn's thesis, however. Sundet et al (2004) found in Norway that while 1) average intelligence gains and average height gains did coincide chronologically, and 2) intelligence gains were mostly due to a rightward shift of the left side of the intelligence curve, 3) height gains were mostly due to a rightward shift in the *right* side of the height curve. This is contrary to what I'd have expected, since height and IQ correlate slightly (~0.2). I don't think this is fatal to the nutrition theory, but someone should be looking at this in other nations as well. If I wanted to explain this away I'd start with a story about assortative mating.

There's one last thing about the Flynn effect that contributes to its overall weirdness. Rushton (1999) ran a principle components analysis on the Weschler data and concluded that the IQ gains were largely unrelated to g. So how do we reconcile this with the fact that the most massive gains have been shown on the most g-loaded tests, like the Raven's?

Well, for a start we need to be careful in understanding just what Rushton did here: His analysis shows that when you break down the items on the Weschler test there's no correlation between their g-loadings and how much of an increase they've shown while the Flynn effect was in play. But this is still compatible with g's biological fundamentals being on the rise-and if you look at the correlation structure, as Flynn (2000) did, you find that there's a moderate correlation (~0.5) between how much of a rise a Weschler item has undergone and how strongly it correlates with scores on the Raven's test. The fact that it's not higher than this could be due to a combination of measurement error and the fact that the Weschler is more heavily loaded on crystalized intelligence while the Raven's loads more heavily on fluid intelligence (see here for more on the distinction). These appear to be neurologically distinct faculties (see, for example, Prabhakaran et al [1997]), so it's entirely possible that one could be improving more than the other.

Here I simply have to cease speculating because I've hit the limits of my current state of knowledge. Helpful suggestions and alternative ideas welcome.

psychometrics

Previous post Next post
Up