May 28, 2009 11:54
I'm posting here 'cause Ops/Eng is a little busy. This morning one of our user clusters just... fell over. This caused a plethora of error messages and some journals to be completely unavailable. The issue was resolved in less than an hour, and we've switched to the slave while Ops works on it.
Edit 12:40 p.m. PST/19:40 GMT: there have been a few lingering issues - we'll do a web tier restart soon, which has the potential to clear those up. web tier restart is done