Blip

Jan 29, 2009 19:34

I've been snowed in from work for two days now, but still working thanks to the magic of the internets. Being told to do stuff I don't quite feel like doing over the phone might be worse than in person.

Last night I found out there was no snow shovel in this place, so I ended up using a rake to clear a path to my car through the foot or so of snow on the ground. I took the ice scraper/brush out of my car and found that it did a better job of digging snow than the rake. However, when clearing the path from my car to the road, which was blocked not only by the normal snow but also the iced-over plowed roadside snow, I gave up on all tools. Instead, I found myself trudging through the snow with my feet. One would think this would be an incredibly dangerous (if not just time-consuming) idea, but somehow my feet and shins didn't get any colder than they would have exposed to air. The trudge was a workout; much despite the low temperature my glasses were fogged and I was working up a decent sweat. I think I may have done about 60 or 70 laps before I tried to pull my car out. After only four or five tries I made it onto the road.

Where did I go? Wal-Mart. To buy a snow shovel.

I just opened up the front door to get the mail out of the box. An icicle fell and hit me on the back of the head. That didn't feel nice, so I took the handle of the rake, still in the doorway, to clear away any other icicles that wanted a piece. The threshold of my door looked like someone had emptied an ice tray on it. The mailbox was iced shut, so I bludgeoned it with a wrench to crack the ice. That was kinda fun. The mail inside was slightly moist and entirely worthless in content. Oddly enough, the word "icicle" got Queen's "Bicycle Race" stuck in my head. "Iiiiicicle, iiiiicicle..."

I shaved the beard yesterday. I think it may have been growing since November. Much to my surprise, I think I look better without it.

I found myself working on an old issue: the backing up of my important stuff. For me, data backup has been about, in this order:



  1. conglomerating a bunch of files into a single archive file (the tar step)
  2. compressing that archive file to make it take up less space (the bzip2 step)
  3. encrypting the compressed archive file so that nobody but me can make any sense of it (the gpg step)
  4. getting that encrypted, compressed archive out somewhere that isn't here so that I can get it back if my hard drive ruptures (a.k.a. offsiting; the rsync step)

Long ago I wrote scripts to do all of this for me on a periodic basis. In the late spring of 2007 (I think) I decided there were two things lacking in this scheme. First, it used bzip2 when the right tool for the job had become lzma. Secondly, it didn't have an incremental backup capability.

Incremental backup is the idea that you make a full backup of everything once in a while, and then between full backups you make incremental backups where you only record what has changed since the full backup. tar has some support for this, but I think that if one byte of a file has changed then the whole file shows up in the incremental backup. I was thinking that a narrower block granularity would work better for a lot of my stuff.

rdiff is a program that, using algorithms swiped from rsync, records a delta file containing, with a reasonably fine granularity, only the differences between two files. What makes rdiff interesting in this regard is that it doesn't actually use a live original when recording the changes; it uses a small "signature" file (not to be confused with a crypto signature) containing checksum information about the original. (More about that in a moment.)

It doesn't make sense to record the differences between two .tar.lzma.gpg files; two archives containing similar but not quite identical data aren't necessarily going to have a lot of similarity after compression and are guaranteed to be as dissimilar as possible after encryption.

So, the archives must be rdiffed as .tar files before doing anything to them. But the product of the full backup must be a .tar.lzma.gpg, and we don't want to have to decrypt and decompress the full backup to create a delta. Fortunately, rdiff doesn't need the original; it just needs the checksum file. Yeah-full circle.

As one might have guessed, I implemented these year-and-a-half-old pending changes today. I used an old and extremely hacky script I wrote called utilitee to allow the .tar archive of a full backup to be used to generate both an rdiff signature and a lzma-compressed, gpg-encrypted archive without needing an intermediate file. On an incremental backup, the same specifications are used to make the .tar archive, but it is rdiffed from the signature file and then discarded while the resulting delta is compressed and encrypted (in my system, this results in a file named something like oldarchivename.tar--to--newarchivename.tar.rdiff-delta.lzma.gpg). To recover a particular incremental backup, both the original and the delta need to be decrypted and decompressed before combining using rdiff patch.

The method of offsiting hasn't changed: rsync over ssh to my hosting provider. (The fact that the destination is a computer where someone else has access is the reason these things are encrypted in the first place.)

Ciao.
Previous post Next post
Up