Extracting LJ entries into local files, OS X edition

May 30, 2007 12:32

ETA 2007-08-03: I have a newer, betterer tool in progress. This tool will a) back up locally, and b) optionally migrate your posts. I need testers. If you're reasonably comfortable with the shell and running commands in the Terminal, please hop over there and give it a shot. Thanks!
Xjournal is an OS X client that has a "download all LJ posts" feature. Go to the History view (command-1), refresh the history, then click the "download" button. Wait a few minutes while it chugs and grabs all your posts.

This is handy. However, Xjournal puts the data in a really terrible location. They're all in the file ~/Library/Application Support/Xjournal/History.plist. This is a property list file, which is a common XML document type used in lots of ways by OS X apps. However, it's not very convenient for humans, if you want to do something else with your post data.

I have written a small python script that shows how you might extract the data and save it into individual files: extractXJournal.py. Download this, save it somewhere, and make it executable. That is, launch Terminal, and type:
chmod a+x extractXJournal.py
Then run it, also from a terminal window:
./extractXJournal.py

It'll extract the entries into the current directory, inside a file hierarchy like "year/month/date", with the files named for the post ids.

This is a five minute hack job, and there might be bugs. Tell me about them here and I will fix them immediately. It did, however, extract all my journal entries nicely into html files viewable in a browser. ljdump is another Python option to download directly. However, its file naming convention was annoying, and its output is also XML. This is handy for further scripting, but not so handy for your browsing.

ETA: Okay, we've had two people with very different environment run this successfully, so I think we've got the bugs squashed. Thanks to elementalv and nightfive for their help with testing!
ETA2: The latest version of the script converts lj user & comm tags (but NOT cut tags) to working links. It also handles illegal characters like the Windows heart character.
ETA3: The latest version (10:30am PDT May 31) also generates an index file at the top level, listing all your posts in a spectacularly ugly way.
ETA4: It's possible to modify ljdump to do something kinda neat: versions of your posts with comment threads. It's a bit of work, and I realize that the urgency level of the problem is no longer what it was yesterday. But I think I might do it. YNK when the next apocalypse is gonna come.
ETA5: Running an older version of OS X? Try extractForOldMacs.py instead. Probably won't work.
ETA6: Charm is another python command-line client that might work to archive your posts in one step. It successfully archived my personal journal, but failed several times to download all of the posts in a community I mod. So mileage may vary.

geek

Previous post Next post
Up