Mar 30, 2009 14:00
If you work in any sort of computer related job at one point or another you'll be responsible for some sort of downtime. This can range from something small and isolated like someone's workstation being fubarred all the way up to taking down a mission critical system that millions of customers/users interact with and depend on. When dealing with large scale websites this can be an even more ominous specter as depending on your daily traffic can translate directly to lost revenue for your company.
I've been directly responsible for unplanned downtime before. At my current job I've been personally responsible for about 15-45 minutes of downtime over a period of 10 months. That includes the 20 minute outage that I caused today. The funny thing is that the downtime isn't what I dread most. It is strangely the most straightforward and easy to handle. You identify the problem, develop a resolution and then implement it. This is usually followed by a standard email that explains the problem, the duration of the outage and follows with what you're doing to prevent it from happening in the future. The thing that I have an issue with is the near constant barrage of person to person contact through the remainder of the day. "Hey, I heard the site was down..." "Site back up yet..." "What happened with the site?" It's like pissing your pants and then after excusing yourself from the office, putting on a fresh pair and then coming back to the office you're assaulted with your coworkers "Hey, I heard you pissed your pants..." "Stop pissing your pants yet..." "What happened to your old pair of pants?"
Note to self: Do not trust your boss when he implies that it's just a quick thing, test, test test! Also do not piss pants.