John hits on yet another problem while riffing off
the latest paper looking at recent regional adaptation in humans:
Imagine if you had a sample of men and women, and you chose an arbitrary cutoff of stature to distinguish them. Say, everyone over 5 foot 7 is a man. Well, that will do better than chance, but you've included a lot of women in your sample of men, and vice versa. Now, suppose you thought that men were inherently rare compared to women. Say, 100 women for every man. A cutoff of 5 foot 7 inches is going to include many more false positives (i.e., tall women) than genuine men. So you choose a very conservative cutoff, one that is not likely to include very many women. Maybe 6 foot 5. The people you see who are over 6 foot 5 are extremely likely to be men -- not certainly, you still will catch some very tall women -- but quite likely men. But you've excluded 95 percent of the men to do this.
That's the situation we are in with respect to detecting selection. There is an enormous set of false negatives -- truly selected alleles that are indistinguishable by means of an arbitrary cutoff from neutral alleles. . . . Johansson and Gyllensten suppose that each ascertained variant (at s=0.01) represents almost 5 in the population. So far, few have made much of the point that a small number of selected alleles under a very stringent cutoff must correspond to a large number that don't make the cutoff. . . . The issue is not only ascertainment; it is the shape of the non-ascertained distribution.