Gaussian anthropology

Feb 21, 2009 09:38

Either everything is deeply interrelated or there are ideas whose time has come or, perhaps, I see patterns where there aren't any. I say this because whenever I read two or three interesting things in a row, no matter how diverse, I see ways in which they are saying the same thing. Perhaps I have a gift for synthesis.

But I am betting on pareidolia. )

Leave a comment

Re: Frac'n fractals halfjack February 22 2009, 17:53:03 UTC
I'm glad I got the salient points out of the material, though we've been wielding the word "agency" around the game design/theory world for a while and the intention is similar.

Re: Bi-modal Gaussian distribution and salaries. These refinements of normal distribution fit more of the data but don't fit full range. The probabilities at the extremes are much higher in the normal models than we actually see, and so we tend to throw out the ones that are 6-sigma as "outliers" -- an example of the model eating its own tail and proving itself by discarding data that the model itself says is too unlikely.

I don't know what you actually do to model it though. Certainly some variation of a power law is more correct, but the unpredictability of many things is too extreme to know if you've got it pinned down -- the past doesn't give you enough information about the future when you're in the data set. The fact that the sun comes every day leads you to believe reasonably that it will tomorrow and every day but when you put yourself in the data set you would be tempted to use the same information to conclude your immortality.

As for books, I only think coherently for a couple thousand words tops, then I'm done. A collection of my wacky and wholely uncited essays maybe.

Reply

Re: Frac'n fractals koala_bob February 22 2009, 18:22:52 UTC
'Outliers'. Ok, now you've touched a nerve. There is a lot of misundertanding with respect to why outliers are discarded from an analysis. If a value falls far from the expected range of values it is typically assumed to be due to one of two things - either it is an error or it is something that is very different from the things that were being studied. In the latter case discarding the outlier is not a refusal to acknowledge that it exists, but rather, recognition that it requires reclassification. In your 'salaries' example, in reality there are likely to be four or five modes, representing 1. working joes, 2. urban professionals, 3. coporate directors/hockey players/drug dealers, 4. corporate owners, and 5. oil/computer tycoons and royalty. If you're gathering data on salaries of urban professionals in Vancouver and you found one individual who was 6 sigma above the mean, then you would be wise to conclude that that that individual should not be classified as an urban professional.

So, I don't think the existence of outliers is a flaw of the Gaussian model, but to the contrary, it is one of the more valuable aspects of it (as I frequently tell my students - "the outliers are often the most interesting"). But, how this overall pattern came into being may be more comprensively explained through fractal geometry, which as I understand it is the product of 'chaos'.

Reply

Re: Frac'n fractals halfjack February 22 2009, 19:04:57 UTC
That all makes good sense to me, and certainly being able to change your model to account for outliers is valuable academically, but there's still a tail-eating problem there -- the assumption that the Gaussian model still holds, but that you just have a new category with a new curve. I guess in the end you're necessarily approximating what's really a massively multi-variate and dynamic (and therefore fundamentally intractable) function.

The biggest problem I encounter with Gaussian modeling, though, is not in academia where revising the model is not just an option but potentially a publication. Rather in industry where it is just not acceptable to question the model, and outliers are discarded as necessarily part of bad data collection or some other non-systemic feature. Far better, as you say, to identify it as something certainly interesting and then focus on it.

However, even given what you've said, you have to consider the possibility that an outlier indicates a broken statistical model -- choosing only between error and "out of scope" misses significant alternatives and assumes the Gaussian fitting rather than testing it. Especially if you keep finding new categories.

Reply

Re: Frac'n fractals koala_bob February 22 2009, 19:33:45 UTC
Yes, I have encountered situations where someone in 'industry' has tried to use a Gaussian model to discredit the results of my work. Specially they took issue with the fact that a two week survey I conducted resulted in the discovery of more archaeological sites than 7 years of previous survey, by their researchers, had yielded. But I don't think the model is inherently flawed, but rather many people's use and understanding of it. I find it a great tool for exploring data and identifying things of interest. But having said that, it is just a model and by nature is an simplification of reality. Its focus on central tendency limits its utility in modeling. This is recognized by many statisticians (hence the development of Bayesian statistics and Chaos theory)

Reply


Leave a comment

Up