Trials, tribulations, and reinstalls

Sep 07, 2010 21:28



Sometime about 0440 Sunday morning, my main server, babylon4, went down hard and fast.  I still haven't been able to reconstruct a single thing about what actually happened, but it left the machine down, with the boot archive corrupt and the boot blocks completely gone.  The last thing logged before it went down - about half an hour before - was Apache2 logged some probably-acne-ridden git trying in vain to probe for phpMyAdmin holes.  (It's not installed.  Neither is PHP.)  Just to add a weird touch, whatever happened apparently sent a break over the serial console line to epsilon3 and halted it too.

I didn't discover this until I got up on Sunday.  I fairly quickly discovered that all of the ZFS filesystems and their data were completely intact; the system just couldn't be booted.  I spent essentially all of Sunday trying various different ways to repair the boot blocks and boot archive, not one of them successful.  I managed once to boot it by hand using the grub on a Solaris 10 install DVD and the failsafe miniroot from my Solaris installation, but that wasn't any help because ZFS on-disk had been patched to a newer version than originally installed and the ZFS patch didn't patch the ZFS drivers in the miniroot.

Well, I never got around to live-upgrading the machine to Solaris 10 u8 10/09, anyway.  So on Sunday I backed up the user-data filesystems in the root ZFS pool over to the main array pool using ZFS snapshots, blew away the root pool, and reinstalled 10/09 from scratch, then on Monday morning set about reinstalling third-party packages and reconfiguring the Solaris ones the way I wanted them.  I had a few minor fights with smf, the Solaris Service Management Facility, but after it saw reason and agreed to do things my way, I had most of it all set up and running again Monday night, and finished setting up the last group of services today.  Thanks to the ZFS snapshot gambit, I didn't lose a single file or configuration setting.

Of course, being a prudent sort, right now I have a fresh set of full backups running.

This entry was originally posted at http://unixronin.dreamwidth.org/769557.html. That post currently has
comments.
You may comment there via OpenID even if you do not have a Dreamwidth account.

hardware, geekdom

Previous post Next post
Up