examples of metonymy and ambiguities in statistics, with varying degrees of harm

Feb 22, 2009 23:00

* Bayes Rule: "P(A,B) = P(A|B) P(B)" means "forall a,b . P(A=a, B=b) = P(A=a|B=b) P(B=b)". This is very standard.

* "variance of the estimator" means "variance of the sampling distribution of the estimator". AFAICT, this is unambiguous, and the only reasonable interpretation is for "estimator" to mean the random variable. To make this even more explicit: the estimator(RV) is the result of applying the estimator(function) to the random data.

* "estimate the parameters" means "estimate the values of the parameters"; more confusingly, "choose the parameters" can mean "choose the values of the parameters". This may just be the econometricians I've been reading.

* "distribution" to mean "family of distributions". Very standard. No one blinks an eye at "the Gaussian distribution". I think "family" is typically only used to describe families for which mean and variance are not sufficient statistics.

* "sample" to mean "data point". One should be careful here: in standard usage, a "sample" is a collection of data points. Sometimes, though, one samples just one point, and metonymically calls it "the sample".

* using "correlated" to mean "dependent". This is incorrect, except in special circumstances, such as multivariate Gaussian models.

* using "sufficient statistics" to mean "summary statistics" (e.g. in the context of mean-field approximations). This is incorrect.

---

UPDATE: I should write a SigBovik paper titled "Introduction to Statistical Pedantics".

education, stats, language

Previous post Next post
Up