This is all a bit meta...
Extracting a reasonably complete copy of my picture collection from LJ is proving interesting in various ways. The latest "scrapbook" implementation is Flash-driven, and therefore not amenable to being spidered over. That means I needed another way to find the actual images. Now, I have extracted the text of all my entries, so I figured a good first step was to extract from that all the image URLs. I decided I'd limit that to the images in LJ. Even that wasn't exactly simple. It's complicated by the fact that LJ has used several different indices for scrapbooks over the years, each with its own format of URL. The older URLs are still sort-of supported, but they're served via one or more levels of 302 re-direction. I think I have now downloaded a more-or-less complete copy of my scrapbook, or at least of the images in my scrapbook that are actually referenced by posts, but it has taken three iterations through the original list so far:
- To get all the images referenced using the newest scheme, and capture the redirections for images using the older schemes.
- To get the images pointed at by those redirections, and capture a second layer of redirections.
- To get that last lot...
There's still a quite a big mismatch between the number of images in my scrapbook and the number I've managed to fetch. I guess I've probably not used every image I uploaded in actual entries, so they probably don't matter that much, but I'm chasing some obvious loose ends so I think the final difference will be rather smaller. I will end up with a fair copy of the all ones that seem to matter to my journal, which is useful from the back-up point of view, if for no other reason.
I doubt that updating every entry (on DW) to point to images in a new location is sane (or even practical) so I guess the images in LJ's scrapbook will just have to stay there for as long as it lasts.
For the time being DW will remain the primary location of this journal. I'll leave the cross-posting to LJ alive for a while longer, at least while I figure out what else to do.