Earlier this week, another piece of statistical theory fell into place for me, this time inspired by reading Cox&Hinkley.
One of the key principles expounded in this book is known as the "conditionality principle": given your model, if you can find a statistic that is ancillary (i.e. invariant to the parameter of interest), then your likelihood function should be conditional on it.
Now, if the minimal sufficient statistic is complete (as is the case in any full-rank exponential family), Basu's theorem tells us that any ancillary statistic will be independent of it, i.e. there is a clean separation between sufficient and ancillary. But in curved exponential families, it can happen that there is no maximal ancillary statistic, i.e. you may have multiple choices of ancillary statistic, but combining them yields a statistic that is no longer ancillary. This is a bit troubling to me, because it breaks the nice idea of a bijection between model and likelihood function.
Given a choice between two ancillaries, C&H advises selecting the one whose Conditional Fisher Information has the greater variance. It's not immediately obvious why one should do this, but I think this can be understood as the Conditional Fisher Information giving us a lens into the conditional likelihood function. For example, if the conditional Fisher Information has 0 variance, it may be because the ancillary statistic doesn't add any information (as is the case when the minimal sufficient statistic is complete). However, it still seems plausible to me that the Conditional Fisher Information can be constant (independent of the ancillary statistic) even while the likelihood function is sensitive to it.
C&H also hint at a notion of partial sufficiency/efficiency and how to measure it: just compute a Conditional Fisher Information, conditioning on the proposed statistic.
(Since Fisher Information is an expectation, Conditional Fisher Information is the expectation of a conditional distribution; since the quantity on the LHS is a function of the sufficient statistic, conditioning on the sufficient statistic will not change anything, whereas conditioning on something insufficient can have the effect of making the log-likelihood smoother, and the Fisher Information smaller) Conditioning on ancillary, however, doesn't simple make the log-likelihood sharper: the average of the Conditional Fisher Information is just the Fisher Information.
[the last paragraph is probably wrong; please comment]
mirror of this post