Originally published at
Gao~. You can comment here or
there.
Calls Dan (coworker):
Dan: Hello?
Misu: Hey, Dan, where are you? It’s really loud in the background!
Dan: I’m at the Seahawks game! What ups?
Misu: Remote1 is down, know anything about it?
Dan: What do you mean remote1 is down?
Misu: I can’t ping it. It’s like dead….
Dan: Fuck….
God, today has been one hell of a day. This morning, we started getting calls from some clients about their email not working. After doing some remote troubleshooting, we decided to reboot the Dell PowerEdge 2950 (Win 2k3 Serv) remotely. The little shutdown screen shows up…. then it never comes back one. After a bit of frantic searching for the datacenter’s phone number, we call them up and have them to a hard reset of the server. They say the hear the hard drives spinning up, but without a VGA terminal, cant actually look at what the machine is doing.
Dan calls up Chris to meet him in Bellevue, so that they can head over to the Seattle datacenter today. Before Dan leaves the office, we get another call from a client who was complaining he couldnt process transcations on his webstore becuase it was telling him his SSL Cert was invalid. After a quick bit of troubleshooting, to our horror, it seems like the SSL Cert server on our Dell 2850 also seems to have gotten fucked up somehow. Dan drives like a madman and he and Chris work in the data center till around 3PM getting everything fixed.
Everyone parts their ways to home or recreation since the long ordeal was over. Then 7PM comes around, I go home and check VisualNews Gemot, and it was down. Did a bit of investigating to discover (to my horror) that remote1 (the other Dell 2850 we have) was down. Made a couple quick phone calls and we had to converge all otgether to make a trip to the data center in some of the worst weather Seattle has had in over 15 years. A the NIC server died and unbounded all of our IP addys on remote1.
And now I spent the remainder of my night helping repair damaged config files and rebinding IPs and address to BIND.
And look at the
weather we had to just WONDERFULLY have today when we needed to run to the datacenter and back twice!
Today has been one of the most stressful and hectic days of my life. And I cant wait till tomorrow morning when we get angry clients calling us wondering wtf is going on. Frankly… we want to know wtf is going on too! What are the chances of critical failures on three different servers all within 4 hours of each other?
So tired….. z_z