A bit of information related to machine learning and philosophy talks at MIT.
This Friday, Sep 28, at noon
Vladimir Vapnik will give a philosophical talk, "Inductive principles in machine learning and philosophy of science" during a
Course 9.S912: "What is Intelligence?" class by Shimon Ullman and Tomaso Poggio (Location: 46-5193 (will move to 46-3310 if a larger room is required, outsiders are welcome to come and listen)). If I can summarize it, I'll update this post later.
On Wednesday, Vapnik gave a talk "From Rosenblatt's learning model to the model of learning with nontrivial teacher" at the new
Cambridge Machine Learning Colloquium and Seminar Series. The main mathematical content was that A) it is well known that the error in a support vector machine is inversely proportional to the number of training samples if the classes are well separated by the kernel in question, but is only inversely proportional to the square root of that (i.e. much more training data is needed) if the classes overlap; B) Vapnik claimed that by introducing a second kernel to be used on the training data only (e.g. some creative and not necessarily well formalizable annotations by human annotators, ranging from mundane things to assigning poetic qualities to training samples) one can make the error inversely proportional to the number of training samples even when classes overlap with respect to the main, "production" kernel. (And he was making some far-reaching philosophical conclusions from that, about importance of culture in human learning and things like that. I don't know whether his conclusions can be transferred from support vector machines to other schemas of machine learning. But it certainly looked quite interesting.)
Update: There was a videorecording during the second talk, so there is some chance that there will be a public video. Some material of the first talk was repeated during the last part of the second one (which was 2 hours long). I would not retell the philosophical part. Among the machine learning part, he said that instead of Occam Razor, there is a principle of Large Margin, more precisely, the principle of admitting as many "contradictions" as possible (but contradictions situated on the manifold, and not just anywhere in the embedded space, so to generate artificial contradictions people generate "morphs" (e.g. linear combinations, or mixtures of pixels) of objects of different classes, and this also reduces the resulting error while training on a fixed data set).