Interesting papers:
http://adam.chlipala.net/papers/PhoasICFP08/https://www.cs.utexas.edu/~wcook/Drafts/2012/graphs.pdf The first one shows how to construct a representation of a dependently-typed lambda calculus using any functional language with type parameters and algebraic data types. The technique is called "parametric higher-order abstract syntax" (PHOAS).
The second one shows how to use a simplified form of that representation (without dependent types) to represent lambda calculus with true sharing of the sub-expressions.
In brief, PHOAS is an abstract syntax tree for a lambda calculus where variables are represented by a type parameter, which is universally quantified impredicatively.
Here is how it works:
Consider how one would implement a simple interpreter for a lambda calculus. One defines a data type for the abstract syntax tree (AST):
data AST = Const Int | Lam (Int -> AST) | App AST AST
Examples of expressions are
(Const 1)
and
(App (Lam (\x -> Const x)) (Const 2)) -- this is the identity function applied to the value 2, i.e., (x -> x)(2).
Then one defines an expression evaluator:
eval :: AST -> Int
eval (Const x) = x
eval (App (Lam p) q) = p (eval q)
This is an oversimplified implementation, and there are several problems with this data type: for instance, types are not checked. An obvious problem is that variables can only have integer values. We would like to have functions of type AST -> AST. But how do we represent variables?
Try adding a clause for variables:
data AST = Var String | Const Int | Lam String AST) | App AST AST
What used to be Lam (\x -> Const x) will now become Lam "x" (Var "x"). But now we have introduced a possibility of having mismatched variable names, and the implementation becomes more complex.
The PHOAS encoding is a trick that forces all variables to match by using parametricity.
Instead of using strings for variable names, we encode variables as values of an unknown type v. Each expression will be parameterized by the type v. Also, the type v will have a universal quantifier that's hidden inside the data type:
data PHOAS = PHOAS { term: forall v. Var v | Const Int | Lam (v -> AST) | App AST AST }
It is important that the data type PHOAS is not itself parameterized by v, only the sub-expressions are. This forces each sub-expression to be fully parametric in v.
In Scala 2, this encoding looks like this:
trait PHOAS {
def term[V]: PHOASterm[V]
}
sealed trait PHOASterm[V]
final case class Var[V](value: V) extends PHOASterm[V]
final case class Const[V](value: Int) extends PHOASterm[V]
final case class Lam[V](func: Var[V] => PHOASterm[V]) extends PHOASterm[V]
final case class App[V[(func: PHOASterm[V], arg: PHOASterm[V]) extends PHOASterm[V]
Scala 3 should be a lot shorter, but I still haven't learned the new syntax.
Now it is impossible to implement a function with mismatched variables. The evaluator (of type PHOASterm Int -> Int) is straightforward: we just call the function under Lam on its argument, which is of type v.
There is one interesting twist in this technique: adding a recursive binder. This binder represents mutual recursion of several PHOAS terms.
In Haskell, this is added by
data PHOAS = PHOAS { forall v. Var v | .... | Mu ([v] -> [PHOASterm v]) }
I implemented PHOAS in Scala 2, but there are significant difficulties with implementing the recursive binders. The papers use Haskell, which is lazy by default, but Scala is eager by default, so any laziness must be encoded explicitly. It is interesting to see that the laziness has to be encoded in a very special way for the PHOAS technique to work with recursive binders.
In Scala, this would be naively implemented by:
final case class Mu[V](terms: List[V] => List[PHOASterm[V]]) extends PHOASterm[V]
However, the evaluator does not work because the fixpoint combinator requires laziness of its arguments. So far, I was able to make it work through some hacks and by managing laziness manually. The implementaiton would have become a lot cleaner if we had the following construction: a lazy tuple. It is a tuple of several elements, of fixed length. The parts of the tuple may be values of different types, but each value is lazily evaluated.
There is a blog post explaining how Scala 3 implements heterogeneous lists:
https://www.scala-lang.org/2021/02/26/tuples-bring-generic-programming-to-scala-3.html This may be essential for a Scala implementation of recursive binders in PHOAS. I haven't yet done this though.