My favorite non-fiction work is a book called "
Knowing and Guessing" written by
Satosi Watanabe . There is cute little argument in this book that every researcher in machine learning and adaptive methods will do well to understand. It is called
The Ugly Duckling Theorem. And it shows that if all possible 2^(2^n) Boolean functions on binary n-vectors are considered, then there is no way any pair of binary vectors can be considered more similar than any other pair--n-pixel binary images of ugly ducklings and swans are equally similar--throwing a major wrench into the possibility of clustering and classification. Of course, this does not mean we cannot limit our attention to certain features and thereafter do clustering and classification. In fact, this is exactly what we do all the time. But the theorem opens the way to measure bias of knowledge representations and feature spaces.
I was at the
MMDS workshop at Stanford recently and different papers on classification, clustering and DR chose to use different distance measures in setting up their optimization algorithms. I am sitting there wondering how much magic is in the algorithm and how much in the distance function. It would be good to characterize the bias of distance functions the same way we characterize the bias of knowledge representations. For many years, scientists in statistics and learning used mean squared distance measure across a variety of algorithms and data sets, limiting the bias only to the data dimensions on which the algorithm was applied. With the creativity of the last decade in switching to clever distance measures, one must be careful accepting the results unless they have been demonstrated on a wide variety of problems and datasets. Creative definition of distance functions (without explicit biasing) will otherwise become the new black art form in machine learning.