Understanding Coastline myself, part 1

Mar 07, 2015 09:36

When my colleagues ask if I myself understand how Coastline works, I honestly answer that no, I don't have a complete grasp of how it operates. Yes, I wrote it and I know how it's supposed to work, and I also know that it seems to work well enough in production projects that they don't randomly crash anymore.

But, the thing with Coastline is, it's an experiment in implementing a concurrency model that I have never seen anywhere else. A concurrency model that started as a hunch and still has no formal definition or proof. It seems to do a great job at handling the race condition and deadlock scenarios I throw at it, but my imagination is limited.

Still, sometimes I come up with yet another interesting question of "Will it blend?" variety. Today's original question was "will c1.wait(c2) work if c2 = c1.q('bg', ...)?", or, in even less understandable terms, will trying to wait on a background task from the same caller work? It doesn't make sense to wait on a foreground task, since it has by definition finished when you get to the next task on the same caller. It is however completely reasonable to launch multiple background tasks and then wait for all of them to finish before proceeding, so that is what I tested.

And while I confirmed that yes, it works, I also found what doesn't: wait() on tasks that have a caller and have finished results in a deadlock.

In other words, given var c = coastline(); var c2 = c.q(...);, trying to sometime later wait(c) will work as expected (return immediately) but wait(c2) will never return. Can you spot the difference? It's that c2 has a caller (c) while c itself doesn't.

The reason it doesn't work slowly became apparent as I looked at the wait() implementation. We just try to push a noop task on the context we wait for with the waiting context as the caller. I wanted to use "waiter" and "waitee", but turns out they have reverse meanings. English is funny.

So, why does it deadlock? Because of this line. If something is pushed onto us and we're not working right now, we shall start working, but only if we have no caller. Why? Because if we have a caller, we can only work when caller bloody tells us to work, like here. And the reason for that is that if you have a task, you would probably rather it executes when it's turn comes, and not whenever it likes. And by that logic not only a task shouldn't work before it's position in caller's queue is reached, but it also shouldn't work ever again after it finished and therefore the caller has moved on to the next task.

So the first thing I did was add an exception and sure as hell the test code triggered it. And none of the production code I tested did, so I probably didn't break anything.

Note to self: Stop being an unprofessional idiot and write some proper tests already.

Maybe tomorrow. No time for this now. Right now we have a wait() that throws an exception when it should just return immediately and we can't have that. So. It looks like the elegant one-line solution must be replaced with a subtask that will just return if the task is finished. Quick check against other deadlock scripts and production code and here we go. Another bug fixed.

Probably.

I'll certainly write those tests one day, I certainly will.

я им обещал, я лох, я больше не ем грибы, work work, wtf, coastline

Previous post Next post
Up