More on XST: natecull

natecull

More on XST

Jan 09, 2009 12:05

I am really liking Extended Set Theory. It's putting some theoretical bones on my intuitions. This paper fills in some more of the thinking behind it and hints at how the relational operators are modelled.

From what I understand so far, here's the guts:

* Most of modern mathematics is founded on Set Theory (presumably the axiomatic set theories such as Zermelo-Fraenkel). Set theory constructs pretty much every mathematical object out of nested sets; it's like a very low level machine code for maths. So any flaw in set theory will cause fairly big problems further up the line.

* As it happens, there is such a flaw. 'Classical' set theory (the ordinary kind in use today) had big problems representing orderings, because sets aren't ordered. Unfortunately, the concept of ordering is really important - in computing, there's at least two kinds of ordering (ordered pairs and n-tuples) which everything is built from. Lisp went with ordered pairs for its lists, while n-tuples surface in 'vectors' or 'arrays' or 'structs' - C is more of a tuple-ish language than a pair-ish one. When it comes to performance, native tuples are really important because a computer memory is naturally an tuple, not a list of ordered pairs. Relational database theory also needs tuples to represent rows.

* Classical set theory smuggles tuples in by creating them as structures of nested sets. That works, to a point, but it has nasty side-effects which effectively means the ordinary set operations of union, intersection and friends explode dramatically on such nested tuples; they do not do what you intuitively would expect (ie if you take the intersection of (a,b) and (a,c), you get (a,a), which is wtf? That's not an intersection, it's not anything, that's just weird! But it's what set theory gives you, so you've got no choice but to take it. And tuples of more than two elements do even weirder things.)

* Because tuples are so ill-behaved when you try to treat them as sets, the general answer of set theory is 'well don't do that then'. This limitation makes its way into relational theory as the restriction that you can't nest tables inside rows (because rows are tuples while tables are sets).

* Relational theory also doesn't let you look *below* tables to their implementation, or say anything about any kind of ordering. But in real life SQL, ordering has a lot of meaning: we not only want a table, we want it sorted (ORDER BY). And we want to implement it somehow on computer hardware, which means mapping it eventually to a block of memory, which is an ordered array of bytes. But set theory (because of the exploding tuple problem) knows nothing of ordering, so relational theory doesn't either. That means that while relational theory is 'correct' as long as you only talk about tables, you can't use it to test the correctness of your RDBMS *itself*.

* And when you get to XML and RDF, which are all about the ordering of elements, relational just gives up and ignores that kind of data completely, with the result that XML and relational go their separate ways and there's this huge modelling gap.

* The answer is to extend classical set theory with the idea of 'scopes', which makes sets into structures more like dictionaries: X is an element of Y 'under the condition Z'. If Z is NULL (the empty set, like Lisp's NIL) then it says the same thing as ordinary set theory.

* With scopes we can now talk mathematically about tuples or dictionaries or XML documents or a whole lot of other low-level data structures, including raw I/O blocks. This means we can use relational-type math to model our entire computer system from the hardware up. One possible advantage is that we could use this, like we currently optimise queries, to basically 'compile' queries right down to a very low-level form - to hardware, if necessary.

* What comes next I'm not sure. I assume there's a lot of mathematics which might need to be rewritten on top of XST instead of CST, if we want to have well-mannered tuples and treat them like sets, but a lot of that is probably irrelevant to computing. Or rather, existing CST maths probably won't need to be modified, but rather there's probably a whole set of new XST maths that can be invented which can make things simpler. This means that there may well be algebraic treatments of problems which are currently only posed in terms of algorithms. Exactly how the functional programming people deal with this, I'm not sure - it will be interesting to see how they intersect.

* Exactly what 'scopes' relate to in terms of existing data structures I'm not sure. I *think* they map best onto something like 'slot' or 'field' or 'property' or 'key' (for tuples, for example, all the XST examples use 'scope' to mean 'address'). An extended set comes out looking something like a multivalued dictionary - or very much like an RDF Blank Node - a thing which has a number of named properties each of which has a number of values.

This doesn't quite match with the usual dictionary construct provided by languages such as Javascript, which tend to have singleton values, but you could probably implement extended sets in such languages as dictionaries-of-lists. (Edit: Actually dictionaries-of-sets, but most such languages only have dictionaries and lists, not sets. You could possibly also do it with dictionaries-of-dictionaries and ignoring the value of every key.) The important part though is implementing the XST operations as an algebra of built-in functions.

* It's tempting to try to use extended sets to model function applications themselves - 'X element-Y of Z' would perhaps mean 'X is the value returned by calling function Z with argument Y' - but what the maths says about that, I'm not sure.

* Extended sets also seem to capture my intuition that the *names* we assign to things are deeply important and must be modelled in the underlying mathematics - this is something that functional theory (like, say, combinatorial logic and Joy) generally disregards and leaves as an implementation detail. That seems like a big mistake, on the same order as relational's abandonment of ordering and implementation. XST may help explain why.