About the fixation of syntax to concepts: timwi

timwi

About the fixation of syntax to concepts

Apr 19, 2010 13:04

Coders seem to be unable to distinguish the concepts of a programming language from its syntax. One coder might argue for Python on the grounds that it has certain innovative language features, but another might “refute” this by arguing about the shortcomings of whitespace-based syntax. The two are talking about entirely separate things.

This problem is not helped by the fact that the vast majority of programming languages are built with this coupling fundamentally integrated into the whole design. There is usually no self-contained “runtime” that can just execute the output of any random parser for some random syntax. Maybe in the specific case of Python there is - in fact, I know there is for Perl 6 - but the main point still stands that most programming languages are not designed with this in mind. In particular, a very popular “microlanguage”, namely regular expressions, is universally infected with this assumption. Even the modern .NET architecture still uses the same old crappy regular expression syntax that was originally popularised by Perl, strengthening the belief among the naïve that regular expression syntaxes must necessarily be this arcane and compressed. Thus 21st-century coders still form one camp that argues for regular expressions because they’re powerful, while the other camp argues against them because they’re unreadable and unmaintainable. Both of them are right, and both of them are talking about different things.

Even the most modern programming languages that compile into a hardware-independent virtual architecture (Java or .NET), are suffering from this problem. One could kind of argue that C#.NET and VB.NET are actually just two different syntaxes for .NET, but in reality they have separate compilers that do similar but not identical things.

How often do you hear programmers argue about the colour schemes used in syntax highlighting? This is rare because everyone can choose their colour scheme for themselves without impacting anyone else. You don’t commit your colour scheme into source control.

I would like to see the next step in the evolution of programming to be one where everyone can use any syntax of their choosing without impacting anyone else. In this world, source code would be parsed into an abstract syntax tree, and it would be this syntax tree that would be shared between developers and checked into source control, so everyone can see everyone else’s work in one’s own preferred syntax. This step is hard because there needs to be universal agreement on the format of such a syntax-tree file so that source-control software can properly display diffs and annotations in the user’s preferred syntax. Once done, debates about “syntax” in a programming language become a thing of the past. People will debate “syntax” about as much as they debate colour schemes today.

But we can go even further. The next step is to do away entirely with parsers. There would instead be source code editors that edit the syntax tree itself. These editors only need to render the tree in the user’s preferred syntax, but they never need to turn plain-text files back into a syntax tree because it is already one. Then we can finally stop thinking of source code as plain text and start visualising and manipulating it graphically.

You may say I’m a dreamer...