typesafe task-local storage in rust: bubblingbeebles

bubblingbeebles

typesafe task-local storage in rust

Jun 23, 2012 05:06

Since this is my first Rust post, I'll give exposition by noting that I'm spending my summer as an intern at mozilla research working on Rust, which is a great-yet-incomplete language that I have high hopes for. Specifically, I'm working on the runtime scheduler, which supports language-level abstractions for task parallelism. A brief taste of what's currently possible:

fn main() {
        for uint::range(0,100) {|i|
            task::spawn {||
                io::println(#fmt("I am child %d", i));
            }
        }
    }
Note: {|args| body } denotes a closure, where args is a comma-separated list of identifiers to bind (or nothing for no arguments).

Rust's concurrency model allows for no shared state among tasks; all communication is through message-passing. There are no mutable globals, and passing pointers among tasks is restricted by the type system such that when you do, you surrender ownership of the pointer and can't use it yourself anymore. (More on pointer types in a minute.)

Sometimes we need mutable global variables, though, most commonly in libraries that need to keep state without requiring the user pass it around. Examples are inter-task failure propagation (i.e., a 'group' of tasks in which if one dies, all die - I'll be working on this next) and code that uses native C libraries that require no-argument callbacks. We want to stash some state for a later function to access without carrying it around everywhere explicitly, and of course concurrent tasks can't step on each other.

Task-local data should be polymorphic (can stash stuff of user-decided type) and extensible (can stash arbitrarily much stuff, rather than having predetermined stuff "registered at build-time" or somesuch). Let's say you can name your data slot by creating a local_data_key - then you want to "index" into the stash using something like this interface:

fn local_data_set(key: local_data_key, data: @T);
    fn local_data_get(key: local_data_key) -> option<@T>;
Like:

fn message() {
        alt local_data_get(msg_key) {
            some(msg) { io::println(*msg); }
            none      { fail "message missing!"; }
        }
    }
    fn main() {
        local_data_set(msg_key, @"hello world");
        message();             // Should print the above.
        task::spawn(message);  // Should fail.
    }
Note: @T denotes a "shared box" pointer to T - this is a heap object whose pointer you can copy around and pass to other functions without losing ownership of yourself. (Contrast with ~T, a 'unique' pointer to T which you lose as soon as you hand off, which can be passed among tasks.) Shared boxes are reference counted, so copying the pointer bumps the count, and leaving the scope of one decreases (and possibly frees) it.

So, what do we make this "key" thing that can uniquely identify a user's stash slot? (Later, "tasklocal" might be a language construct that does this automatically, but for now, we're actually implementing it!) Important considerations are:

General key comparison. Internally, we have to be able to compare keys for different types against each other. Everything is stored in the map opaquely - this could be a job for existentials (which Rust can do inelegantly), but Rust has no reliable type-level equality, so we'll have a lot of "unsafe::transmute(data) as *void".
No type coercion. A user shouldn't be able to build a key, instantiate it at int to set a value, then instantiate the same key at fn() to retrieve the value back out as a different type.
Global key recovery. Code must be able to get keys for already-set values from scratch. (If you could pass keys around, you don't need TLS!)

Using an inhabitant value of the type won't allow us to compare against other types. User-defined tags would allow for trivial coercion. All that's left that can be specific to the type are functions, which still allow for coercion via polymorphism (more on this at the end), and enums (Rustic for datatype/newtype). We can require the user to instantiate an enum at T (Rust forbids polymorphic constants), and compute its address (unsafely, internally) to index into the stash. If the user declares immutable global keys, they can be used elsewhere to retrieve stashed values. Hence:

enum local_data_key = ();  // 'newtype' syntax. T is "phantom".
    let const msg_key = local_data_key() as local_data_key;
    // Above example can now work.
This is almost right - there are still three small hurdles.

While this is only useful when the key is global, an adversary could coerce types using function-local keys if different keys in different functions ended up with the same stack address. This could be avoided by ensuring at runtime that key addresses are in the global data region (and silently rejecting requests if not), or by extending the type system to capture the notion of global constants.
LLVM might collapse multiple constants with the same value to have the same address. This could be avoided by not telling LLVM they are const.
Rust doesn't yet support global const enums. Oops.

These aside, the scheme is typesafe. Given that we need TLS soon, though (and that I want to move on to build linked failure), I'm instead implementing keys by having the user supply a dummy function on T, and indexing into the stash using the function's code pointer (which is global for sure).

enum local_data_key { local_data_key(fn(T)) }
fn msg_key(str) { }
// Then write "local_data_key(msg_key)" when using the interface.
Here, an adversary might define the key function itself polymorphically, and instantiate it differently when getting and setting. Fortunately, Rust monomorphises, so the differently-typed instantiations will have different code addresses, so indexing would not be fooled. Unfortunately, it is still not quite safe, because monomorphisation actually stops at @-boxes: f(@5) and f(@"foo") would compile to calls to the same f. So the TLS interface has to be marked unsafe for now, until the above hurdles can be cleared.

rust, code