Originally published at
Konstantin's Private Blog. You can comment here or
there.
I went to graduate school in Minnesota, a fountainhead of structural economic modeling. My particular field of economics - Empirical Industrial Organization (IO) - is primarily built around bringing complicated models of firm and consumer behavior to the data. The classic structural approach to modeling goes as follows: describe a set of agents, specify preferences and objectives for every agent, and write down a set of equations that determines how agents interact in equilibrium.
If I had to put a finger on a single commonality shared by virtually all Empirical IO papers, I would have to say that they all complain about insufficient richness of the data. If only some extra source of variation was available, the story goes, the paper would have been so much more insightful. Needless to say, I was quite enthusiastic about joining Amazon: in all likelihood, we have the most detailed and comprehensive data on dynamics of multiple market segments.
Unfortunately, this enthusiasm proved to be slightly hasty, for a simple reason: structural models do not scale to our data. One cannot use a
BLP-style demand estimation procedure when consumers have to choose between tens of millions of products. Computing value functions becomes prohibitively expensive when you have to do it for thousands of agents over multiple years of data at daily frequency. But even if you somehow magically address these engineering aspects, the fact remains that structural estimates are numerically very unstable. There are multiple anecdotes of people trying to reproduce published results to no avail. I would rather not point fingers at anyone, but it should be clear that for business purposes, this instability is unacceptable.
Steve Berry once mentioned that he thinks about IO as the econometrics of moderate-sized datasets, and I suspect that anything with over a hundred thousand observations is probably not «moderate» in his book. When hundreds of millions of observations occur routinely, you need to look elsewhere tool-wise.