Rahul recently pointed me to a nice 1998 review of datamodels
in the database world
by Stonebraker and Hellerstein:
What comes around goes around.
This paper is quite insightful, and I believe a good overview of the field,
but it falls into the usual traps shared by most database practitioners.
This got me started to think about what programming languages and databases
have to learn from each other.
Unlike most people (and not just in the field of databases),
Stonebraker and Hellerstein also have a uncommonly good understanding
of human factors in communication of meaning and
economic factors in adoption of software,
and how they often override technical concerns.
They show particular insight in their discussion of Schemas, XML, and
the inadequacies of "semantic web" concepts
when automating business transactions.
Like most database people,
these authors understand the importance of Data;
they understand that Data is both precious and
fragile against loss, corruption and bit rot.
Meanwhile, most computer scientists are still struggling
with the concepts of Persistence, Robustness and Evolution:
schools teach how to program with self-contained one-off toys
that have short-lived data and that no one depends on;
all the focus is on the (important) immediate algorithmic aspect,
but little is said about the (equally important)
long-range aspects of software engineering.
Precious Data
needs to persist across runs of individual programs,
across power cycles of the machine;
it needs to survive crashes of individual programs,
crashes of the machine;
it needs to resist corruption by buggy programs
or by race conditions amongst programs accessing it;
its shape itself will evolve to meet new requirements,
and although this phenomenon is much slower
than the evolution of programs modifying data,
the old data needs to be preserved through these changes,
and somehow the system must continue to run all along.
Yet by and large, programming languages and operating systems,
even when they offer concurrent programming capabilities,
do not offer proper support for transactions on the persistent data;
whatever support exists for transactions
often comes in clumsy and brittle libraries,
and evolution is almost never supported at all
(even less so with statically typed language).
However, like most database people also, these authors
are not trained in semantics,
and they can't seem to fathom the notion and the possibility
of abstraction as a general concept.
Instead, they speak here of queries to operate on sets of records,
there of logical vs physical data independence,
there again of keeping things simple,
of user-defined types and functions,
of standards,
etc.,
and they seem to think of language expressiveness
as a cute feature but not all that important
(or at least they are content registering that the market values it little).
More generally, they do not understand the notion of a programming language,
and think they can get away with throwing together features
for their database interface and achieve a satisfactory design
that will be used by application writers in a language independent way.
Yet, the whole notion of language independence boasted
by database designers
(as well as designers of operating systems and other infrastructure)
is but the pride by these self-ignorant mono-linguists
to not call their own barking a language.
To them language is a slur for what application programmers use.
Little do they realize that the database interfaces they are offering
are programming languages indeed and that their ignoring
the hard-earned lessons of programming language design
imposes a high cost upon themselves and their users.
Most importantly, most database people deliberately try
to ignore the dynamics
of the algorithms that manipulate data,
but instead have mostly read-only views of data
for which they design fancy query sub-languages;
they fail to recognize the importance of concepts of ownership,
of intensionality vs extensionality, etc.
Because of the limitations they impose,
application writers have to retrofit these concepts
in ways the consistency of which is not taken into account
by the otherwise sacrosanct integrity management of the database.
The whole discussion about datamodels is thus poisoned
by an attitude based on the wholly absurd premise
that a datamodel is a modular aspect of an application
that can be factored away from the rest of the system.
This attitude is not so problematic in the case of database gurus
like the authors of said article,
who are able to adapt the internals of the databases they develop
to extend their datamodels to fit the needs of application writers.
But lesser database specialists,
who do not develop and extend databases,
reduce all data to some poor datamodels,
where data relationships are something static and cast in stone;
they consider computations that will happen on the data as something
unfathomable, irrelevant, unworthy of interest, modularized away.
As a result, they insist on alleged simplifications,
normalizations and representations,
that only simplify, normalize or represent
but the small part of the system that they oversee,
at the expense of hugely increased complexity in the rest of the system,
and communication problems between developers.
The worst kind of datamodellers is those data bureaucrats
who code neither application nor database infrastructure,
but who imagine themselves the masters and keepers of some datamodel
that has a value independent of the rest of the system.
They spend their time slowing down development
with bureaucratic processes and time wasted using
their pitiful tools and pseudo-languages,
contributing nothing but complications and gratuitous dependencies
for those who manipulate the data,
have the domain expertise,
and actually understand what the data is all about.
To be fair, blindness and bureaucracy
are not the exclusive attribute of Data guys.
There is plenty of such horrors amongst Code guys.
Blind Coders will lightly consider data persistence as well as all I/O
to be architecturally unimportant ancillary tasks
that can be factored away from code.
Code Bureaucrats will insist that everyone should use their one blessed
Language and Implementation, strictly respect their Object Model and
Programming Methodology, and follow canned templates or graphical tools
to export and document interfaces or models that they have to bless.
The worst amongst them will declare that they own the API
and create additional hurdles and gratuitous compatibility backwardness
to the already difficult task of developing software,
without contributing anything to the bottom line of building a working system.
None of these people will understand the big picture,
the social issues of development,
the burden their decisions impose upon others,
the cost of their folly to the group,
and least of all the possibility to automate away
all the rigid and stupid rules
that constitute the cherished meat of their own petty bureaucratic job.
In the end, the fields of programming languages and databases
contain complementary lessons.
Software engineers should learn from both.
And more importantly, they should expand their views
to the dynamics of the whole system
rather than a small static aspect of it.
That, and avoid bureaucrats.