technicalities: specs (Lisp, ML, ALGOL, Ada, JS2, ES4, RnRS, Redex, POPLmark, Rust, K, Comp/JSCert)

Feb 06, 2014 09:17

According to the History of Programming Languages website (currently offline) there have been somewhere around 8500 programming languages developed since the Autocodes of the early 50s. The internet archive has scraped HOPL's taxonomy at least as well as a few references and at least the primary year indexes (the year of my birth, for exampleIt's ( Read more... )

tech

Leave a comment

Re: POPLmark challenge graydon February 7 2014, 21:08:30 UTC
I don't quite know how to respond to this in a way that's not offensive; I want to assure you that I have no horse in this race and it's entirely possible that Twelf Is Better Than Coq (which seems to be what you're getting at). I apologize for giving short shrift to Twelf. I don't actually know how to evaluate them; I don't think POPLmark can tell us that.

For what it's worth: I never got the impression that the challenge was a "pick-a-winner" competition of systems. If you look at the initial posting it seems clear to me that the organizers just wanted to get a very general "sense of the field", stimulate discussion, see some approaches people would take on benchmark problems (which are still "toy sized" relative to industrial PL problems, but whatever).

As a (now-ex) paid PL designer (not a logical-framework expert, much less the author/designer of one) I still don't know how how to evaluate the POPLmark results. I don't understand the submissions especially well, I can't give a particularly compelling description of the difference between a HOAS solution and a locally nameless one; I can barely decipher the simplest proofs in each framework, and certainly couldn't make any of my own at this point, about any of the languages I've worked on.

As I pointed out earlier in this post, my work as a PL designer has been about making human/computer interfaces and most of that work honestly falls on the "human" side. The human is the more-fallible, harder-to-predict part with the much, much more limited (cognitive) budget. I can design languages with cool features that I can figure out a machine implementation for much more readily than I can figure out how to make it palatable to a human. To my eyes, the biggest problem with all proof assistants is similar: not that they're not powerful enough as technical artifacts, but that too often, humans can quite figure out how to use them to their full power.

That is: either I am incompetent (you're welcome to reach that conclusion, it doesn't offend me; in some sense we all are) or else the cognitive load of all the formalisms and their logical-framework mechanizations is probably still too high for the target audience. This is why I find Redex and K a little more appealing: for reasons I can't even articulate I find their models more approachable. Maybe it's just that they give up on the "proof" side and aim for "high-confidence model checking". I don't know.

Personally I think it's telling that most of the submissions to POPLmark still seemed closely associated with the particular academic lineage surrounding the tool's designers. That is: few people outside the very top experts on the planet could operate these tools to the effect required. That suggests to me that none of them are quite approachable enough yet to be used regularly in language design. Whether approachability is the goal or not is a different question, possibly of no interest to the tool designers. I don't know.

Reply

Re: POPLmark challenge ext_1472537 February 8 2014, 05:12:29 UTC
In my opinion it does not make sense to say that disparate systems such as Twelf and Coq are better than one another. So I am not saying that. What I am saying is that a benchmark was formulated, a challenge was laid, and we won that challenge hands-down using Twelf. If the challenge had been for something other than mechanization of programming language metatheory, then we wouldn't even have entered, because Twelf would not have been suitable to the task. This is perfectly clear, and not in contention. But the idea at the time was to see which system was better for doing language metatheory, and this was the benchmark that was established to test that premise. We didn't choose the benchmark; in fact, it is quite clear from the formulation that the benchmark was chosen to defeat us. Although we showed that Twelf was superior for the given purpose, the proposers just carried on promoting Coq for mechanized metatheory. In fact it is such a poor choice that the currently recommended practice is to use an erroneous notion of substitution in a non-erroneous way in order to make progress on this sort of problem using Coq. As I mentioned we went on to use Twelf to solve this sort of problem at scale without any significant difficulties. Comparable results have not, to my knowledge, been achieved using any other system.

Reply


Leave a comment

Up