Development With Types

Apr 09, 2007 17:40

Lots of folk seem to agree that one of the nice things about having a blog is that you can use it as a public whiteboard, to help order your thoughts on any given subject. I don't post all that often myself, but this isn't because my thoughts are already ordered. It's because I've long been in the habit of using a different technology for ordering my thoughts - and when it comes to the sort of geeky code philosophy that might be most at home here, that technology is often pozorvlak.

Every so often, he'll encourage me to blog something from one of our email conversations and invite comment - and I should do so far more often than I do. I don't think he'd object to my describing him, for the purposes of this post, as a "recovering perl hacker". That is to say that he's fairly happy with the sensibilities of the perl hacker, and using them to produce Stuff. He's also aware that there are other ways of doing things, and wants to understand them. One of these ways is embodied by the Haskell language and community. The relationship between the language, the community and the development style are a frequent topic of our emails. I'll open then, by quoting a question he asked in a recent email:

...this led me to a question that I've been wondering about for a while: various Haskell bods (Cale, Duncan, YAHT, etc) claim that Haskell code, once it compiles, is typically bug-free. YAHT says "most of the code I've written is not buggy, as most of the common sources of bugs in other languages simply don't exist in Haskell."

This is not at all my experience. I haven't measured it properly or anything, but it feels like my Haskell code is approximately as buggy as my Perl code, with the added downsides that it takes a lot longer to get to the stage where you can test it, and a lot longer to find the bugs once you can :-( [Must get round to trying Hat.] But things like off-by-one-errors, infinite loops, and good old "not understanding the problem properly and writing the wrong thing" are all possible in Haskell, as I've discovered to my cost.

There are two ways I can see to interpret their statements:

  1. They're talking about things like memory leaks, null pointer bugs, bugs caused by looping over array indices, etc, that don't exist in Perl either, but do exist in C.
  2. They're using l33t guarantee-oriented sk1llz to prove correctness of their code using the type system.

Do you know which one it is?

Now, I certainly think there's a fair bit of (1) that goes on (for example, using the Maybe monad to solve the strings problem), but I'm interested here in what pozorvlak perceives as (2). In short, I don't think anyone's seriously going round proving all their code, using the type system or otherwise - but I do think he may have spotted a difference in the typical approach to a problem between average perl and haskell advocates.

What follows is pretty well taken straight out of the reply I emailed back. In it, I sort of accidentally started using the word "Duncan" to mean "a mythical perfect haskell programmer". I hope he doesn't object.

Do you remember that Dijkstra rant we both read some time ago?

It included such gems as:

The nice thing of this simple change of vocabulary is that it has such a profound effect: while, before, a program with only one bug used to be "almost correct", afterwards a program with an error is just "wrong" (because in error).

and:

Right from the beginning, and all through the course, we stress that the programmer's task is not just to write down a program, but that his main task is to give a formal proof that the program he proposes meets the equally formal functional specification. ... we see to it that the programming language in question has not been implemented on campus so that students are protected from the temptation to test their programs.

I think this might be an indication that Dijkstra writes assembly the way that Duncan writes haskell - and vice versa... I'm not sure - but I think there might be something in the notion.

Let's think about the two... In fact - I guess I'll start by thinking about Dijkstra, and wind back around to haskell in a second... When we read that rant for the first time (and exchanged emails about it), the impression I eventually came away with was that Dijkstra had an interesting idea, so far as it went. He would probably be able to produce a generation of students who could each take a formal specification of a problem (or a similarly clear exam-question style description of a problem), and produce correct code which satisfies it. This is a useful skill, but is only part of the job of the programmer, who also has to come up with that specification in the first place.

You mention three classes of bugs (off by one errors, infinite loops, and not really understanding the problem) in your email - it's the third of these that I wasn't convinced that Dijkstra's methods would deal with for us.

I am fairly convinced that the first two can be eliminated this way. I have an excellent book on my desk - a hang over from the Oxford course - called "Programming: The derivation of Algorithms" by Anne Kaldewaij. It shows how to take a certain sort of specification and, almost entirely mechanically, derive the programme from the specification. There are points of choice - of creativity along the way, but they're limited to high level things. Low level details like off-by-one errors are eliminated by the system, so long as you're able to follow it (it can't guard against typos). Termination proofs are part of the thing too, so you're not going to get infinite loops either.

I never actually write procedural code this way - it takes far too long. But I can see that a mechanical method exists which will eliminate those kinds of silly mistakes, and I guess I nod towards it when I write complex things (which isn't very often).

I'm willing to take it on faith that Dijkstra could produce coders who could code in a way similar to that - which eliminated those sorts of bugs, and he could probably have them turn out code at a fairly professional rate. I think many of these sorts of problems can be eliminated by certain sorts of high level language. I think the memory errors you mentioned, and the strings problem I mentioned are also in this class of errors. The difference between the solutions to those problems that we mentioned, and the solution that Dijkstra proposes, is that ours is packaged up - the proof of the java garbage collector is in the mind of some Sun engineer somewhere - and java programmers don't have to care. There doesn't happen to be a packaged up proof/method for the off by one error you came across, so the error caught you out. The advantage that Dijkstra's students would have, is that they'd be versed in that style from the beginning - they'd be able to use the garbage collector the same as you, but when they found a gap in the system (the space for an off-by-one bug), they'd be able to fill it in, and avoid the bug.

So, for that class of what I'll call "silly mistake" bugs, I think that haskell's probably better than perl, is better than C, is better than assembly. None are perfect, because none of them produce proofs automatically, but someone versed in that particular skill can fill in those gaps.

The interesting class of bugs is the "not understanding the problem" class. Having a specification eliminates this problem, so we can think of it as "producing a specification" or "describing the problem" or something like that. I know that Dijkstra has written big reliable code, so it seems to me that he has a solution to this problem, even if he doesn't bang on about it so much as he does about eliminating the "silly mistakes".

So, back to the point in hand - haskell and Duncan's programming style. I suspect that both Duncan and Dijkstra may have hit upon a similar solution to the specification problem, namely, to explore the problem in a different intellectual space than the runtime-inspired one that imperative programmers (apart from Dijkstra) tend to use. A portion of that space may well be type-system inspired for Duncan. It may be quite a large portion, but I don't think that's all of it. I suspect that when he's writing a programme, he thinks about what he wants it to guarantee first (as distinct from what he wants it to do - as you spotted), then tries to express a portion of that as a series of function types... The programme isn't even in his head yet.. The specification (probably part formed at this stage) is in his head, and the silhouette of the specification on the type system is on the screen. Then, the interesting bit - the agile methods style cycle of prototyping, testing and refining the spec can happen using the type system and Duncan's mind. He's practised at knowing what can and can't be easily expressed in the type system, so he doesn't waste time trying to express things that aren't easily expressible - but there's enough of a silhouette for him to get a tangible grip on the programme. Dijkstra, of course, doesn't need this - he's a maths god, and can do this stage all in his head.

Once the types of his functions are all written down, I'd guess that the spec is clear enough to him, that he can spend all his effort now on eliminating the "silly mistake" type bugs. He still hasn't written any code, but he already knows what the shadow of the code looks like, so he's free to concentrate on the detail as he refines the specification in his head to code on the screen. If he's chosen his types carefully, many silly mistakes will be eliminated by the type system. And since he's been thinking about the relationship between the spec, and the types he's written down, it'll be more obvious to him what guarantees he needs that the type system can't give him - so he'll be watching for those mistakes as he writes the function bodies, and possibly coming up with other abstractions to help guard against them on the way.

So, I guess an advantage of this kind of method would be that one separates out the problem of understanding the problem, from the problem of silly mistakes in implementation... Tackle the first whole heartedly first, then you're free to tackle as much of the second as can't be done mechanically already with your full attention.

Does that make any sense? It's kind of a stream of thought, and I'm probably projecting a whole load of random junk onto Duncan's image - but it seems plausible to me that such a theoretical Duncan might exist.

Perhaps I/we should send him a copy of this, and ask how close it comes to what he does ;)

perl, hacking, codology, haskell

Previous post Next post
Up