Mou-souji, Part 2 - Dreamwidth

Jan 04, 2020 19:35

The two big computer things that I had wanted to get done with the ōsōji end-of-year cleaning was email and my blog. I got email under control on New Year’s Day. Today I completed my blog task.

All I needed to do was convert 17 Years of Blogging into archivable PDF files on Belldandy. There’s a lot of history in this blog, and I didn’t want it all to vanish without a trace upon circumstances outside of my control. As it is, the LiveJournal Mirror could go away at any time (including, unfortunately, many of the images that I embed in both blogs, LiveJournal and Dreamwidth).

What I used for this conversion is BlogBooker, which is a direct descendant of LJBook - the service I used in the early/mid 2000s. I’m not keen on the layout/typography of the final product - but I don’t have any better alternatives. First priority is capturing content; second priority is appearance.

I hadn’t done an LJBook conversion in 13 years, but I knew enough to not convert more than a year at a time. My blog is pretty big. In setting up to use BlogBooker, it appears I had purchased a Standard package in 2016. I wonder why I didn’t complete the PDFs then. Oh, well... that’s in the past. I ran a test with under the Free option, and the application worked pretty much as I was used to. For me to create 17 separate PDF files, I needed more than the Standard package, and I purchased the (expensive) Premier option. But, all things considered, the price is reasonable given how bad I’d feel if my entire blog disappeared.

I converted the short year, 2003, then did a full year, 2004. For 2005 I decided I’d better check out all the fonts to make sure I was going with the best one. And part of the problem was that the font I was using (Carlito) couldn’t handle any Japanese characters, not even hiragana. I tried Code2000, which said it was wide range - and it was - hiragana and kanji characters were displayed properly. But the roman characters were sooooo ugly - like the text you get on the Engrish instructions printed in 1960’s Japan. Ick. I was going to sacrifice kana and kanji for reasonable English text (which is 99.99% of the blog anyway - most of the kanji in my blog is in song titles).

Code2000 was disappointing - but there was another font to try - Free Sans. I chose it and submitted a conversion request.

And then saw I’d made a huge mistake.

New conversion requests bring up a form that retains the settings from the previous request - blog name, picture quality, date range, included attributes, and all format and feature selections. I changed the font selection and resubmitted - forgetting one important thing... the date range is retained BUT the checkbox for doing the ENTIRE blog is checked. That checkbox does not remember the previous setting, and the box becomes checked every time. You have to remember to uncheck the box in order for the date range to be used.

I didn’t realize I’d asked for the entire blog to be converted until I saw that the process was taking longer than expected.

I could have cancelled the job... but... what would happen if I let the job finish?? Under the Premier plan I had way more credit than I actually needed. I decided to let it run. Up until then, a full blog year took 1 minute of computing time and generated a 250-page PDF.

I let it run.

It used about 32 minutes of computing time. (Premier gave me 10 hours, so no sweat.) It actually created a working PDF - a giant PDF. The PDF was about 1GB in size, containing 7200 entries with 6400 images across 9500 pages. But the table of contents was fully indexed in a nice tree structure. It was pretty impressive.

Unfortunately, the Free Sans font was not good. It didn’t handle any Japanese characters, and the roman text was ugly. I have a huge, ugly PDF - but contentwise it’s impressive. I’ll consider redoing the exercise with the Carlito font. Maybe.

Anyway, I went back to doing individual years. I’ve got 17 PDFs. It was interesting comparing 2004 (273 entries, 23 images, 250 pages, 5 MB) to 2019 (376 entries, 712 images, 856 pages, 132 MB). I’m counting on Time Machine and Backblaze to make backup copies.

Still, I’d like to figure out a way to get a better extract of my blog posts... including all Japanese characters... and also including each post’s userpics. Better spacing and formatting, too. It might be another one of those projects I do in my retirement. I haven’t done anything in 17 years; I suppose I could wait five more.

You're welcome to comment on LJ, but I'd rather you leave a comment on the original post at Dreamwidth. The current comment count is
.

living, writing, tech

Previous post Next post
Up