Burin 0.001: A Minimal Glulx Lispy Thing: natecull

natecull

Burin 0.001: A Minimal Glulx Lispy Thing

Jan 03, 2009 16:00

Happy New Year.

I've started playing around with I7 again and though it's great fun, once again I keep coming up cold against the syntax. I've also finally started reading Peter Norvig's Paradigms of Artificial Intelligence Programming and I'd love to play with some of the old classic algorithms like General Problem Solver, but I can't because of I7's lack of true lists (its 'lists' as far as I can tell are only strictly-typed vectors).

Part of my goal with Dataspace has been to think through aspects of the Perfect IF Language. I've made very little progress there, I'm finally getting frustrated enough to get down to bare metal and try to code something, anything.

So here's a rough first-draft specification for Burin, an extremely stupid interpreted Lispy/Joy-y/Floo-y sort of thing with lots of rough edges.

1. Everything will be lists of 32-bit cells or single-word symbols.

2. No quoted strings - just word lists. The symbol table will be as optimised to use the Glulx 'binary string search' opcode for reading and the address of a word will be a Glk string so they can be immediately printed.

3. Optimised for the Glulx engine, so it has distinct ROM and RAM portions in its memory map.

4. 31-bit signed integers, or 31-bit addresses, the high bit indicating 'integer' vs 'address'. This means math opcodes will probably need manual fixups and I'll need to swot up on signed integer bit-level representations. A better implementation would probably involve some kind of tagging bits but that seems like work.

5. There will be a compiler written in something like Python which reads a big list, crunches it into a big word list table, calculates addresses, adds a standard chunk of functions for reading and interpreting and basic opcode stubs, and writes out a Glulx or Blorb file. No other processing.

6. The rest will be dynamic processing, much like Floo, in which symbols are bound to lists, and lists are interpreted as Joy-like stack combinators. There will be a dynamic symbol binding table in RAM based on the ROM word table, with bindings generated at runtime, probably using a shallow-binding approach (one list of bindings per symbol), because that seems like the fastest/cheapest initial approach. Correctness of lexical scoping etc not really an issue, just getting the thing running.

7. Possibly a RAM data stack distinct from the Glulx machine stack, if we take a Forth-like two-stack approach.

8. Some kind of nasty hackish scheme where the address range (and possibly low-order tag bits) of an address determines whether it's a builtin opcode/function, a list-as-quotation, or a list-as-executable-function.

9. A RAM word/symbol table for text parsed at runtime, and RAM space for runtime list creation. Probably lists will be limited to proper lists only (ie final cdr must be NIL) and ROM list space will be stored as sequences of 32-bit words followed by a NIL, while RAM list space will be cons pairs for maximum flexibility. RAM list space garbage collection will be handled by -- no idea yet, I'm hoping to avoid the whole issue if I can.

...

10. Profit?

I'm not looking forward to writing a read function (or anything really) in raw Glulx opcodes, particularly if it has to deal with Glk initialisation and string parsing, which is why I've put this off forever -- and since I'm off travelling in two weeks it might not get done within the next two months - but this is sort of a stake in the ground for what I want to get done, in case I forget.

Edit 1: Wordlists are probably going to be slow to print, since they need to do a Glk opcode for each word, so maybe I do need a traditional output-string heap as well as an input-word table. There's something perverse in me though that wants to push wordlists to see just how far they go before they break down.