About the fixation of syntax to concepts

Apr 19, 2010 13:04

Coders seem to be unable to distinguish the concepts of a programming language from its syntax. One coder might argue for Python on the grounds that it has certain innovative language features, but another might “refute” this by arguing about the shortcomings of whitespace-based syntax. The two are talking about entirely separate things ( Read more... )

Leave a comment

timwi April 19 2010, 15:14:13 UTC
I agree that I have glossed over certain details. Let me just say that your example is a bad one because I can’t really imagine that many people would want to maintain a distinction between “if (boolvar) dostuff();” and “if (boolvar) {dostuff();}” - you’d generally prefer either always one or always the other. But I agree there is a the more general issue. A better example might be, for example, that I might want to insert blank lines to group statements into logical blocks. With current programming languages, the AST doesn’t reflect those blank lines. Come to think of it, current AST don’t include comments either.

But these issues are not insurmountable. Of course there are always traditionalists who think new way is “crippled”, but Visual Studio has proven that, over time, people realise that the benefits of the new technology outweigh the cost. For example, Visual Studio’s auto-formatting of C# code makes it impossible to vertically align parameters in a list of similar method calls. I had no problems coming to terms with that because the benefits of auto-format far outweigh this rare special case.

The same will be true of certain AST features. There’s no reason why you can’t have an AST node that represents a logical block, which is used to insert blank lines in rendering but has no effect on execution. You could have an editor that preserves some formatting options in the AST which other editors ignore. But I bet that over time, visualisations will diverge more and more from the 1960s idea of “plain-text files” and move to a paradigm where these concerns aren’t really that relevant anymore.

Reply

pne April 19 2010, 15:47:53 UTC
Perhaps a better example would be "syntactic sugar" -- for example, in Perl, while (<>) { ... } is equivalent to while (defined($_ = <>) { ... }, and in Java, java.util.Collection varlist = ...; for (Integer var : varlist) { var = var + 1; } is short for something like java.util.Collection varlist = ...; java.util.Iterator $tmpiter = varlist.iterator(); while ($tmpiter.hasNext()) { Integer var = (Integer) $tmpiter.next(); var = new Integer(var.intValue() + 1); }.

Sometimes you'd want to see autoboxing and auto-unboxing; sometimes you'd like to see explicit casts resulting from generics; sometimes you'd like to see "foreach" with explicit iterators -- but often, you might not want to see those things. (After all, part of the reason for syntactic sugar is to make code more readable by encapsuling a common idiom into a shorter form.)

Since syntactic sugar is fairly 1:1, it's possible to abbreviate always or to always show the long form, but it's not possible to mix the two (sometimes short, sometimes long).

Or compare the effect of Java import or (I think) C# using; I don't think this is reflected in the generated "object code", so it would probably be missing in the AST as well. But I wouldn't want to read code that always used fully-qualified package names such as java.util.Collection or System.String -- yet if you use, say, java.util.Date and java.sql.Date, then the "decompiler" can't know which of the two it can abbreviate using import.

So, perhaps people will get used to the limitations of automatic retranslation, but I think they will - at least initially - be perceived as limitations.

Reply

timwi April 19 2010, 17:40:50 UTC
The meanings of “while (<>)” and “while (defined($_ = <>))” may be the same, but the parse tree certainly isn’t. I used the term AST assuming it is the same thing as parse tree, but if the two terms are used slightly differently, maybe I should edit the post and call it parse tree. I’m thinking of the result of the very first step in a compiler, which is a simple parse according to a meaning-agnostic grammar - no substitutions of syntactic sugar or expansion of fully-qualified names. It is probably the case that many compilers do such expansions/substitutions at the same time as parsing, but there’s no reason why it can’t be separated and you could still have a parse tree that contains the import/using clauses and that distinguishes the for/foreach loop from its expanded iterator pattern. There is no “automatic retranslation”, most certainly not from “object code” (compiled binaries).

Reply


Leave a comment

Up