Parsing Considered Harmful: fare

fare

Parsing Considered Harmful

Feb 10, 2005 16:32

So you think there is nothing interesting to say at the intersection of Computer Science and Political Economics? Well, Feynman said that Everything is interesting if you go into it deeply enough. And going deeply into it is precisely what I've been doing for ten years now. So there below are some things I have to say about Computer Science and Political Economics.

That is, things beyond the fact that both Computer Science and Political Economics are completely fallacious albeit traditional names: indeed, the former is not a Science (it is an Art, or an Engineering Enterprise, which is one and the same) and to quote Dijkstra it is no more about computers than astronomy is about telescopes, whereas the other is both against Politics and beyond Economics (in either the original Aristotelian meaning of husbandry, or the modern statist meaning of taxable monetary transactions). Good Computer Science is actually Lisp Lore and Craft; good Political Economics is Libertarian Human Action.

Note that if you're not too much into computing, you may skip directly to the paragraph that mention Political Economics and Education. Yes, this is also about Education.
The Evil of Academic Curricula in Computer Science
Chapter I
Parsing Considered Harmful

The extraordinary importance given to parsing in the academic Computer Science curriculum (at least in France) is an unhealthy way to teach procrastination to students: it is a focus given on the mostly irrelevant but nevertheless very intricate issues of building a parse tree from an arbitrary human-writable syntax, which puts back the time one has to face the only thing that really matter, semantics. Furthermore, defining syntax first leads to having complex systems that do not have a clear meaning with simple systematic way of analyzing and synthesizing sentences, but instead have a lot of limitations and special cases.

By contrast, Lisp does away with parsing by providing a simple extensible universal syntax; thus, developers don't have to worry about parsing a syntax, if ever, until the semantics is settled. In practice, defining a new syntax is hardly ever needed, and often detrimental: it leads to a tower of babel of limited-purpose little languages that are very costly and error-prone to interoperate if possible at all, whereas extending the Lisp syntax gives you a single universal language that does everything and with which you may build new features by freely combining previous features.

Occasionally, someone is dissatisfied with the stupid limitations of mainstream languages, and goes on to invent a new language that in his eyes would overcome these limitations. Because he doesn't know the power of Lisp syntax (or lack thereof), he invariably begins by inventing yet another syntax, usually by picking up pieces from well-known languages and sewing them together. By the time he is done defining the syntax of his language, he obtains a horrible new mumbo jumbo where misleading similarities to known systems and previously unknown oddities both contribute to making it all pain to learn; meanwhile, he had no time to spend on really thinking about the semantics of his language, which ends up with semantic limitations almost identical to those of previous languages; and the resulting language is thus no gain to use. If he knew about Lisp instead, he would just extend Lisp from within and focus on adding new functionality without having to waste all his time reimplementing features that exist or reinventing yet another useless new syntax. Eventually, he could do a major refactoring of his system, including a reimplementation of existing features, and a new syntax; but that would only come later. What he would start with right away, is to face the issues that matter: semantic issues. He would only spend time on such secondary issues as syntax and implementation after he had proven that he did have a point on primary issues; and he wouldn't have to handle these secondary issues unless they are found to be significant.

The need for reimplementing new languages arises all the more frequently because people use inferior languages that limit what they can express, and because they ignore the power and extensibility of Lisp. Apart from these cases of new languages, the only case when you actually want to define a new syntax is to provide a dumbed-down interface for mostly clueless users to tinker safely without any risk concerning other parts of the system. But then you'll find that you might as well provide some interactive menu-based interface for simple user configuration; the only people who'll want to use a textual interface will be programmers anyway, and they may as well use the full power of a dynamic programming language, Lisp being the best of them. And of course, the above interactive interface can be straightforwardly extracted using proper declarative programming techniques, at a fraction of the cost of a parser, and yielding a much more usable system.

Now, when the case for new syntax (or lack of need thereof) is resolved, you only ever have to actually write a parser but when you need to interact with old syntax; this means legacy data and external software, as written by clueless people who should have known better. And then, all the fancy parsing theories taught at universities end up mostly useless, since said clueless people, being clueless, did things in clueless ad-hoc ways that do not fit any theory. What you need instead is a set of libraries that handle legacy languages and data formats that you have to interoperate with. Additionally, you may need a few simple tools to extract information from whichever of the wide collection of existing ad-hoc data formats you'll have to face. That's where some parsing theory might help; but the kind of theory rehashed in formal education will only matter for the few who will specialize in writing extraction tools -- and those few will need much more than is ever actually taught in class; they will need efficient algorithms and their actual implementation, with additional intricacies due to ambiguity resolution, error handling, incremental compilation, programming tricks and optimizations, plus many annoying secondary concerns such as interoperation and internationalization, etc.

Now for the Political Economics of Education. We have seen above that all this parsing theory taught for months in universities and heavily used as the basis for examinations, is usually quite harmful and at best practically useless for all involved. Why then do academic curricula, in France and maybe other places, focus so much on parsing? Because they have official curricula done by bureaucrats who advance their own agenda: they do whatever has greater benefit and lower cost to them, disregarding benefits and costs to users. And what makes parsing theory more attractive to bureaucrats? The fact that parsing theory, just like neo-classical economics or statistical sociology, makes a big use of formal mathematical tools. These formal mathematical tools provide for both a scientific veneer and a cheap way to prepare tests that are easily correctable in a uniform objective bureaucratic way. Of course, the relevance of given mathematical tools to any actual real-life situation is moot. Yet this relevance is what academics require students to assume implicitly without critical thought -- and any critical thought about it is systematically crushed by professors. So this is all pseudo-science, and the apparent education is actually massive brainwashing. But the bureaucrats who control education don't really care about science, only about the appearance of it. What more, an actual failure with the appearance of success on their part means that they will be able to request bigger budget to increase their allegedly successful contribution to solving an otherwise growing problem. So this kind of phenomenon is exactly what is selected for by bureaucracy. As Robert Lefevre put it, Government is a disease masquerading as its own cure.

Note that sometimes, instead of faking science, academics try to fake engineering or business usage. That leads to teaching some stupid industry standard programming language or software of some sort. Students then waste years in useless courses learning less that they would in a few weeks of practice; and what they actually retain from this experience is that they have to cope in expedient ways with inexplicable arbitrary limitations of existing software that lack any understandable structure. This very limited software in turn doesn't fit any of the also very limited theories they are taught on the side. Theory is disconnected from practice. The result of all this is a permanent mess in the students' minds, whereas the goal of education should be to give them key guiding concepts that would help keep their minds clear at all times. All in all, pseudo-engineering is just as evil as pseudo-science. But it is also an altogether different topic, and I won't say more about it today.

So, what should Computer Science faculties be teaching instead? Well, they should be teaching How To Design Programs. And this isn't the kind of things that can be dictated in a uniform way by a bureaucratic authority. As Bastiat said, All monopolies are detestable, but the worst of all is the monopoly of education.

lisp, tao of programming, academia, meta, libertarian, education, economics, dynamism, en