Template Haskell, argument munging and operads, part I

Mar 05, 2008 10:37

[The hope was that this post would come in three somewhat independent sections: one pure programming, in which we develop a small utility in Template Haskell; one largely mathematics, at the upper-level high school to beginning undergraduate level, wherein we describe another approach to constructing our utility; and one purely mathematical, more sophisticated but fairly handwavy, wherein I relate all this stuff to my research interests and describe where it leads. The idea was that you could skip the code and jump to the maths, or read the code and skip the maths, or whatever. However, just the first section has now taken far longer than I'd budgeted, both in time and in space, so I'll save the other two for a later post.]

In a recent post, Hitesh Jasani writes about Haskell's flip operator, which takes a function f and returns a function which behaves like f with the order of its first two arguments reversed. So (flip f) x y z w = f y x z w (we write function application without brackets, as is customary in Haskell). I pointed out to Hitesh that actually, we could write such a function in almost any language which supports higher-order functions, and in dynamic languages (like Perl or Lisp) we can go further, and write a function argmunge, which accepts two functions and permutes the arguments of the first one (the "victim") according to the second (the "munger"). So
(argmunge f g) x1 x2 x3 x4 ... = f xg 1 xg 2 xg 3 xg 4 ...
Here's an implementation of argmunge in Perl, and some code to exercise it:
sub argmunge {
my $func = shift;
my $munger = shift; # not necessarily a permutation
return sub { $func->(@_[@$munger]); }
}

sub myprint {
print "Called with args ".join(", ", @_)."\n";
}

argmunge(\&myprint, [2,5,5,6])->(0,1,2,3,4,5,6);
argmunge(\&myprint, [3,2,1])->("fred", "barney", "betty", "wilma");
When run, this displays
Called with args 2, 5, 5, 6
Called with args wilma, betty, barney
Here we don't pass the munger in as a function, but rather as a list of values [g(0), g(1), ..., g(n)]. I'm prouder of that code than I probably should be, because it relies on some nice Perl features to work as it does; namely, Perl's argument-passing convention (in which all arguments are passed to a function as a single array called @_), list slicing (in which you can index into a list with another list), and list flattening (in which inclusion of one list in another splices the inner list into the outer list, resulting in a flat list). I remarked that it wouldn't be possible to write a general argmunger in Haskell, because the type of the result depends crucially on the actual value of the munging function. It ought to be possible to write one in a dependently-typed language like Cayenne - anyone care to do so?

[Edit: it's possible to write an even nicer version in Arc.]

Anyway, it may not be possible in vanilla Haskell, but it is possible using templates, provided the munging function is known at compile-time. Unfortunately, several features of Haskell make the approach we used for Perl non-viable:
  • Pervasive currying, in which every function actually takes only one argument and returns another function that will consume the rest of the arguments.
  • Lists are homogeneous: it's not possible (at least at my level of knowledge) to construct a list with elements of varying types.
  • Tuple types exist, whose elements can be of mixed type, but support for them is poor.
I'm not trying to be critical; all of these things are consequences of features that are usually beneficial. But all engineering decisions are trade-offs, and unfortunately we've hit a lot of downsides all at once here. Still, hope remains. We have at least three options:
  • programmatically construct a lambda-expression, like the zipN example in the original TH paper;
  • curry everything into (a, (big, (nested, tuple))), and then unpack it using something like Audrey Tang's code from the TH wiki;
  • decompose the munging function into a chain of flips, face and degeneracy maps, and then call that chain on the victim function. More on this approach in a later post.
Of the three, I like the first one best. Let's generate the expansion by hand for a simple case, apply the runQ trick to see what its parse tree looks like, and then write the generating code. Our simple case will be:
argmunge [0,2,1,1] = \f a0 a1 a2 a3 -> f a0 a2 a1 a1
Applying the runQ trick gives
Prelude> :m +Language.Haskell.TH
Prelude Language.Haskell.TH> runQ [| \f a0 a1 a2 a3 -> f a0 a2 a1 a1 |]
Loading package template-haskell ... linking ... done.
LamE [VarP f_0,VarP a0_1,VarP a1_2,VarP a2_3,VarP a3_4]
(AppE (AppE (AppE (AppE
(VarE f_0) (VarE a0_1)) (VarE a2_3)) (VarE a1_2)) (VarE a1_2))
By the way, you can turn that stuff back into readable code by applying the "pprint" function, only you have to be careful: a naive attempt to do so with the above will trigger errors saying that a0_1, f_0 etc are all out of scope. You have to replace any variables or functions with calls to mkName, like so:
Prelude Language.Haskell.TH> runQ [| 3 + 4 |]
InfixE (Just (LitE (IntegerL 3))) (VarE GHC.Num.+) (Just (LitE (IntegerL 4)))
Prelude Language.Haskell.TH> pprint $ InfixE (Just (LitE (IntegerL 3)))
(VarE mkName "+") (Just (LitE (IntegerL 4)))
"3 + 4"
OK, so we're going to want to construct a lambda expression using LamE, and the first argument it takes is the list of arguments to the lambda expression. As a first cut, let's try
argmunge munger = LamE (f:args)
-- rest of code here
where f = VarP (mkName "f")
args = map (VarP.mkName.('a':).show) [0 .. (length munger)-1]
Now, the "rest of code here" bit needs to be a data-structure made of nested AppE's: perhaps we can construct this with a fold. A quick :t AppE in GHCi revealed that it has type Exp -> Exp -> Exp: perfect. But do I want foldl or foldr? I can never remember what they both mean. But here's a nice trick I saw on the web recently:
Prelude Language.Haskell.TH> let evshow x y = "(f " ++ x ++ " " ++ y ++ ")"
Prelude Language.Haskell.TH> evshow "3" "4"
"(f 3 4)"
Prelude Language.Haskell.TH> foldl evshow "5" $ map show [0 .. 3]
"(f (f (f (f 5 0) 1) 2) 3)"
Prelude Language.Haskell.TH> foldr evshow "5" $ map show [0 .. 3]
"(f 0 (f 1 (f 2 (f 3 5))))"
Prelude Language.Haskell.TH> foldl1 evshow $ map show [0 .. 3]
"(f (f (f 0 1) 2) 3)"
Looks like I want either foldl or foldl1. By the way, GHCi got about a million times more useful to me when I discovered that you could define new functions and variables in it using let, which I learned from a post in michiexile (aka syzygy)'s blog. Experts are reminded that I am at best an eighth of a Haskell programmer, and so I may on occasion spell out things that you think are obvious: besides, nothing is trivial.

So, let's try argmunge munger = LamE (f:args) (foldl AppE f mungedargs) as our main body. But what does GHC think?
Prelude Language.Haskell.TH> :l ArgMunge.hs
[1 of 1] Compiling ArgMunge ( ArgMunge.hs, interpreted )

ArgMunge.hs:9:14: Not in scope: `VarP.mkName'
Drat it - GHC thinks that VarP.mkName is a call to a function called mkName in a module called VarP, not the composite of the constructor VarP and the function mkName. That's easy enough to fix - put spaces around the dot. But then...
ArgMunge.hs:7:44:
Couldn't match expected type `Exp' against inferred type `Pat'
In the second argument of `foldl', namely `f'
In the second argument of `LamE', namely
`(foldl AppE f mungedargs)'
In the expression: LamE (f : args) (foldl AppE f mungedargs)
Failed, modules loaded: none.
Bunch of arse. Looking more closely (and checking the type of VarP), I see the problem: I've constructed a load of pattern variables, when what it wanted was a load of expression variables, constructed using VarE. OK, so let's try
argmunge munger = LamE lhs rhs
where lhs = map VarP (f:argVars)
rhs = foldl1 AppE $ map VarE (f:mungedargs)
mungedargs = [argVars !! munge_i | munge_i <- munger]
argVars = map (mkName.('a':).show) [0 .. (length munger)-1]
f = mkName "f"
Will GHC accept it?
Prelude Language.Haskell.TH> :l ArgMunge.hs
[1 of 1] Compiling ArgMunge ( ArgMunge.hs, interpreted )
Ok, modules loaded: ArgMunge.
Yay! Let's test it.
*ArgMunge> :m +Language.Haskell.TH
*ArgMunge Language.Haskell.TH> pprint argmunge [0,2,1,1]

:1:0:
Couldn't match expected type `[t1] -> t'
against inferred type `String'
In the expression: pprint argmunge [0, 2, 1, 1]
In the definition of `it': it = pprint argmunge [0, 2, 1, 1]
Gah! That always catches me out.
*ArgMunge Language.Haskell.TH> pprint $ argmunge [0,2,1,1]
"\\f a0 a1 a2 a3 -> f a0 a2 a1 a1"
Zonino! On trying to splice it, I discover again that actually, $( ) requires its arguments to be in the Q monad, so I actually need to change argmunge so there's a return applied to LamE lhs rhs. Fix that, then try :t $(argmunge [0,2,1,1]): it returns (t -> t2 -> t1 -> t1 -> t4) -> t -> t1 -> t2 -> t3 -> t4, which surprised me for a while but is actually what I wanted. But how about :t $(argmunge [0,3,2,4])?
:1:2:
Exception when trying to run compile-time code:
Prelude.(!!): index too large

Code: let
argmunge = argmunge Q $dMonad
$dMonad = Language.Haskell.TH.Syntax.$f20
in argmunge [0, 3, 2, 4]
And I realise at this point that I've forgotten something major: you need to specify what arity you want the result function to have. Let's try again.
argmunge munger arity = return (LamE lhs rhs)
where lhs = map VarP (f:argVars)
rhs = foldl1 AppE $ map VarE (f:mungedargs)
mungedargs = [argVars !! munge_i | munge_i <- munger]
argVars = map (mkName.('a':).show) [0 .. arity - 1]
f = mkName "f"

*ArgMunge Language.Haskell.TH> :t $(argmunge [0,3,2,4] 5)
$(argmunge [0,3,2,4] 5) :: (t -> t3 -> t2 -> t4 -> t5)
-> t -> t1 -> t2 -> t3 -> t4 -> t5
Looks good! Let's test it:
*ArgMunge Language.Haskell.TH> let concat4 a b c d = concat [a,b,c,d]
*ArgMunge Language.Haskell.TH> $(argmunge [0,3,2,4] 5) concat4 "a" "b" "c" "d" "e"
"adce" Hurrah! We should probably put in some error-handling code to make the "wrong arity passed" error less opaque, though. And (he discovers, belatedly) there's no need for an explicit return: instead, one can use the lowercase wrapper functions lamE, appE, varE etc, which return their results already in the Q monad. Also, I don't think there's any possibility of variable capture here, as we're immediately wrapping all our variables in a lambda expression, but it can't hurt to use gensyms for practice. *implements* Actually, yes it can. After quite a bit of debugging and grovelling around checking the types of things, I was able to come up with
import Control.Monad

argmunge munger arity = (liftM2 LamE) lhs rhs
where lhs = mapM (liftM VarP) (f:argVars)
rhs = foldl1 appE $ map (liftM VarE) (f:mungedargs)
mungedargs = [argVars !! munge_i | munge_i <- munger]
argVars = [ newName "a" | i <- [0 .. arity - 1]]
f = newName "f"
but since it's uglier than what was there before and fixes a bug that doesn't, as far as I can tell, actually exist, I'll revert it. Note, by the way, that I had to use liftM instead of fmap: though the docs claim that Q is an instance of Functor, they appear to be full of lies. Is there actually meant to be any difference between liftM and fmap, by the way? Or rather, if M is an instance of both Functor and Monad, should its implementation of fmap be equal to its implementation of liftM?

Implementing the error handling was also tricky: the recover function in TH returns something of type Q (), so you need to bind it to something of the expected type with (>>) before it will typecheck. There didn't seem to be any sensible choices, so I just stuck with the usual expansion. So now the user gets my custom error message and the lower-level "array index out of bounds" message: maybe both will be helpful. The full code is now as follows:
argmunge munger arity
| and [i `elem` [0 .. arity -1] | i <- munger] = lamE lhs rhs
| otherwise = report True "argmunge: illegal argument number"
>> lamE lhs rhs
where lhs = map varP (f:argVars)
rhs = foldl1 appE $ map varE (f:mungedargs)
mungedargs = [argVars !! munge_i | munge_i <- munger]
argVars = map (mkName.('a':).show) [0 .. arity - 1]
f = mkName "f"
Whew! Compare that to the original Perl version. I'm sure there are ways this code could be improved, but I've spent far too long on this post already.

The final code can be found here. As always, all suggestions for how I could improve my code or my development practices are gratefully received!

template haskell, programming, perl, haskell, computers, maths, beware the geek, argmungers

Previous post Next post
Up