Shalizi's analogy for bayesian updating with misspecified models.: atheorist

atheorist

Shalizi's analogy for bayesian updating with misspecified models.

Oct 14, 2011 15:39

I'm interested in combining the idea of lenses (that is, the type-theory/functional-programming idea, see various papers by Pierce and Foster, among lots of other people) to Bayesian update. I think it should be possible to squash these two ideas together (squashing ideas together is almost always possible - the usual question is whether it will be productive or valuable to do so), but I don't know exactly how.

Why is this at all interesting? I think it's relevant to science in general; different subfields and different levels along the spectrum between experiment and theory all do modeling, and communicate locally with one another, and somehow it all works out. I think that's because they have something like lenses connecting nearby models. It might also be useful in artificial intelligence or in designing richly expressive prediction markets, but those are kindof obvious consequences.

I don't understand Shalizi's "dynamics of updating with misspecified models" paper, except a little bit of the motivation and the replicator analogy in the appendix.

We deliberately approximate pretty often when we're modeling. We might approximate a rational number using a finite sequence of digits - but many rational numbers, such as 1/3, cannot be expressed using a finite sequence of (decimal) digits. To describe the shape of a two-dimensional thing, you might represent it as a polygon, ignoring its curves, or a three-dimensional blob as a polyhedron. An n-gram model is misspecified if the true distribution is tree-shaped. Fitting a gaussian to some sort of sensor noise might be a misspecification if there are actually two major causes of noise - thermal noise in the device, and the human misplacing a decimal point when writing down the answer, for example.

Let's imagine that there is a Bayesian who is flipping a coin repeatedly. The coin is actually either weighted towards heads (90% chance of heads), tails (90% chance of tails), or fair. However, in this Bayesian's (misspecified) model, there are only two hypotheses, weighted towards heads, and fair.

According to Shalizi's analogy, we can imagine the Bayesian update as a generation of a population consisting of two species. Each coin flip corresponds to a generation. In the generations where the coin comes up heads, the hypothesis-species that believes the coin is weighted heads has more fitness. In the generations where the coin comes up tails, the hypothesis-species that believes the coin is fair has more fitness.

So for example, if the population was 50 of each, and then the coin came up heads, then we can imagine that we kill off 10% of the weighted-heads-hypothesis creatures, and 50% of the fair-hypothesis creatures, to get 45 and 25 respectively, then renormalize (the creatures replicate back up to a total of 100) to 45/(45+25) = 64 weighted-heads-hypothesis creatures, and 25/(45 + 25) = 36 of the fair-hypothesis creatures. In the long term, if the coin really is weighted heads, then we expect to kill off 10% of the heads creatures 90% of the time, and kill 90% of them off 10% of the time, which is a bit like killing 82% of them every time. In contrast, exactly 50% of the fair-hypothesis creatures survive all of the time, so (if the coin really is weighted heads) we expect the weighted-heads population to go to saturation.

What I'd like to do is connect some variant of the Aumann agreement theorem for two misspecified Bayesians to a (symmetric?) lens. This is the scenario: suppose you are a (misspecified) Bayesian, and you have a friend, who is also a (misspecified) Bayesian (though in a different way). Your friend takes the coin and goes away and flips it. Then your friend comes back with the coin and tells you what they think is going to happen next - a probability distribution over the observable heads or tails. You didn't see the experiments that led to this belief, but you can take their statement as indirect evidence of what they saw and update on it. Since they had the coin, you telling them what your probabilities are, after updating on what they told you, doesn't give them any more information - which I think is something like one of the lens laws.

In this particular case, we can imagine the evidence as a walk north and east on quarter-infinite grid. If the two friends are symmetrical, one convinced the coin is either weighted-heads or fair, and the other is convinced it is either weighted-tails or fair, then it seems likely that there is some nice smooth function to transform one friend's stated probabilities or beliefs into the other friend's stated probabilities or beliefs. In other cases, it's presumably more problematic; I guess I have to find those problematic cases to make progress.