Jul 20, 2009 14:58
A month ago I sent out an email to all developers: "Machine x is going to be turned off, and machine y is going to be turned on. All programs that run on x will automatically move to y. Please check your code to make sure this isn't a problem." I wait a few days to confirm that *nobody* is going to reply (as expected), and then send out a second email, to about 2 dozen programmers saying: "By my best guess, this is the group that should check to make sure their programmers can move from x to y." I also include a few examples of why their programs may not work. I get 2 replies. The next week, I send a third email to the remaining programmers, "This is your last chance to let me know if you're going to have any problems." Along the way, I (1) give a talk at the monthly Developers' Meeting about how all our different background programming works, what's appropriate and what isn't, suggested modifications, etc.; (2) analyze all replies and modify our background jobs so that my "best guess" becomes less that and more "if there's problems, page this group"; (3) modify the automatic startup procedures to document group responsibilities.
I'm feeling pretty good about things: 90% of the programmers (or their bosses - I included them) have confirmed that "yes that's my program" so the next time we do this (probably in September), I'll know exactly who to email.
Of course, I still can't get anyone to actually look at their program and confirm that it won't have a problem, no matter how often I reply to that effect. In one case, I spent 2 minutes (maybe less) and then called the programmer: "Do you know that your program *specifically* wants to run on machine z?" His reply, "Oh, that's why it hasn't started right since January." TWO MINUTES, and I'd never seen his program before and knew nothing about his application!
So, jumping forward to last weekend. First, Pat gave me a wonderful anniversary present: a 2-day course in handspinning, at Halcyon Yarn, in Bath Maine. So, Friday and Saturday I spent there. Sunday was spent in Southern Connecticut at the Midsummer Magick Faire (shameless plug), playing Rufus gratis, hoping to get The Great Unwashed in there next year. Traffic home was abysmal - CT and MA had nightly construction at 3 different spots (time to change our home route for next weekend) - so we didn't get home until 11PM. I spent 11 until midnight checking emails for last-minute changes and confirmations from the Server Group that all Windows computers had the latest patches installed. Midnight to 1AM: Prepping the Development, Q/A, and misc. production servers. 1AM to 2AM: rebooting same, checking logs, etc. 2AM to 3AM: Rebooting all Production Windows Applications servers (only 9 now, thank-the-gods), halting production on (UNIX) machine x, starting production on machine y, and checking that all background processing is running smoothly. 3AM to 4AM: Updating documentation, sending out an email to my group with all the details, etc. 4am: bedtime.
I'm supposed to get Monday off, since I ran the monthly shutdown, but being slightly paranoid, I check in anyways. At 11:30, a programmer emails me: "My program isn't running - please fix it." My reply (phrased better than this): "It's your program, not mine. That's why I gave you a month to look at the code." Her reply: "This is a PCC (Patient Critical Care) issue and I need your help." By 1PM, things are back to normal, I'm still thinking "breakfast would be nice," and wondering is the &^%$ programmer will *now* look at the code or we're going to go through the same panic in September when the code moves back.
End of geek moment. I am now in the middle of a 2-hour conference call (I'm still in my pajamas - I'm glad Larry didn't ask for video as well as audio). My eyes are gummy, I think I'm still a few hours short on sleep, etc. etc.