Either
everything is deeply interrelated or there are
ideas whose time has come or, perhaps, I see
patterns where there aren't any. I say this because whenever I read two or three interesting things in a row, no matter how diverse, I see ways in which they are saying the same thing. Perhaps I have a gift for synthesis.
(
But I am betting on pareidolia. )
Re: Bi-modal Gaussian distribution and salaries. These refinements of normal distribution fit more of the data but don't fit full range. The probabilities at the extremes are much higher in the normal models than we actually see, and so we tend to throw out the ones that are 6-sigma as "outliers" -- an example of the model eating its own tail and proving itself by discarding data that the model itself says is too unlikely.
I don't know what you actually do to model it though. Certainly some variation of a power law is more correct, but the unpredictability of many things is too extreme to know if you've got it pinned down -- the past doesn't give you enough information about the future when you're in the data set. The fact that the sun comes every day leads you to believe reasonably that it will tomorrow and every day but when you put yourself in the data set you would be tempted to use the same information to conclude your immortality.
As for books, I only think coherently for a couple thousand words tops, then I'm done. A collection of my wacky and wholely uncited essays maybe.
Reply
So, I don't think the existence of outliers is a flaw of the Gaussian model, but to the contrary, it is one of the more valuable aspects of it (as I frequently tell my students - "the outliers are often the most interesting"). But, how this overall pattern came into being may be more comprensively explained through fractal geometry, which as I understand it is the product of 'chaos'.
Reply
The biggest problem I encounter with Gaussian modeling, though, is not in academia where revising the model is not just an option but potentially a publication. Rather in industry where it is just not acceptable to question the model, and outliers are discarded as necessarily part of bad data collection or some other non-systemic feature. Far better, as you say, to identify it as something certainly interesting and then focus on it.
However, even given what you've said, you have to consider the possibility that an outlier indicates a broken statistical model -- choosing only between error and "out of scope" misses significant alternatives and assumes the Gaussian fitting rather than testing it. Especially if you keep finding new categories.
Reply
Reply
Leave a comment