Comments | omnifarious: Thoughts on Thrift and D-BUS

omnifarious

Thoughts on Thrift and D-BUS

Feb 22, 2008 20:30

I've been thinking about Thrift and D-BUS ever since I made this post: Java and XML, bad tastes that are worse together. And then I followed up with a reference to my paper: why CORBA and other forms of RPC are a bad idea.

Those, of course, sparked a lot of discussion. Questioning Java and XML, the darlings of 'Enterprise' computing, would hardly ( Read more... )

publish, ideas, evripub, spammed_entry, computers

Comments 9

hattifattener February 23 2008, 05:35:29 UTC

SunRPC and XDR (on which it's built) are pretty widely deployed, NFS being the most prominent example. It's called something other than SunRPC now.

XDR is kind of like ASN.1 without all the OSI cruft.

omnifarious February 23 2008, 05:40:06 UTC

I'd have to look at XDR, but as I recall it was missing a way to handle maps. And the data format was fairly C specific and it wasn't very self-describing. I believe that Thrift, even though the IDML is not available, still describes the types of the various bits of data being sent enough for you to be able to print them out regardless of whether or not you're aware of the IDML.

But yes, you're right. XDR is another thing that fits in that family of things.

hattifattener February 23 2008, 06:44:25 UTC

Yeah, XDR isn't self-describing at all. (Though ASN.1 BER/DER isn't necessarily self-describing either...) You would have to build maps out of lower-level primitives (e.g., a list of key-value pairs), but I don't see that as a problem. It does have a paucity of primitive data types: no wide integers, for example.

omnifarious February 23 2008, 05:48:11 UTC

Thanks though. I added a blurb about XDR because you're right. :-)

eqe February 23 2008, 10:23:42 UTC

I'm suprised you're so down on JSON; it's my current favorite. Of course at work I end up seeing a lot of bencode, and it's more appropriate for random binary data -- length-encoded strings are better than delimited strings in some cases! But the fact that you can write pretty JSON given appropriate whitespace is a big win for some use cases.

omnifarious February 23 2008, 11:29:21 UTC

I only dislike JSON because it is enormously convenient to decode if you happen to be using Javascript, and a mild pain if you're using anything else. I also have a preference for encodings that are binary because they can compactly represent integers and floating point numbers and are also possible to parse in some ways that are pretty efficient.

Thrift's use of numeric field ids allows a structure to grow new fields over time without breaking old stuff, which is kind of a nice feature. You can achieve something similar in a much more verbose way using maps, but again, it's verbose.

eqe February 23 2008, 22:27:56 UTC

AFAIK JSON is only especially convenient to decode in JS if you're willing to use eval(), which I hope no sane person will do in the general case. The pain level for any reasonable1 serialization format is about the same once you've left the realm of "built into the environment ( ... )

Thinking out loud omnifarious February 23 2008, 12:23:32 UTC

Type tags are lowercase if there is no field id, uppercase if there is. The field id is always a count and always comes immediately after the type tag. All fields in a tuple must either have or not have a field id (i.e. their type tags must all be uppercase or lowercase).

't' aka Tuple
A grouping of values.
'c' aka Count
Something countable, but not just an arbitrary integer with no meaning.
'u' aka UTF-8
A single, UTF-8 encoded character.
'm' an multi-precision integer
An arbitrary sized integer.
'b' binary
A blob of binary data
's' string
A sequence of UTF-8 encoded characters.
'i' integer 8
An 8-bit two's complement integer (may also be used for bitfields)
'j' integer 16
A 16-bit two's complement integer (may also be used for bitfields)
'k' integer 32
A 32-bit two's complement integer (may also be used for bitfields)
'l' integer 64
A 64-bit two's complement integer (may also be used for bitfields)
'f' floating
A 64 bit floating point number.
'g' giant floating
A 128 bit floating point number.
'h' half-size floating
A 32 bit ( ... )

Re: Thinking out loud omnifarious February 23 2008, 13:04:13 UTC

A field id is considered part of a type id. If a type id is a capital letter, then it is immediately followed by a count containing the field id.

Here is a description of the encodings which include type ids in their encoding. Note that only the tuple allows a type id that includes a field id.

Note also that the 'random' type does not allow a field id for it's enclosed type because the field id (if any) should've been applied to where the type id for the random type occured. If the random type allowed a field id for its enclosed type then you could just say an array or dictionary contained a random type and then put meaningless field ids for all the elements.

Tuple (type tag 't')
t<count of # of fields>......
Array (type tag 'a')
a\000...\001
Known length array (type tag 'n')
n<count of elements>...
Dictionary (type tag 'd')
d\000...\001
Variant type (type tag 'v')