I want to talk about a project I did with the Dreamwidth (DW) comm
scans_daily. Probably this will be a lot of tl;dr for everyone who isn't me. But I am a mod of this place! So only
catechism can stop me, and she won't, because we made this comm so we could be ridiculous about Delicious in public. This definitely qualifies.
Now
scans_daily is a pretty big community with well over a thousand members and more than 8,000 posts. Compared to other LJ/DW communities, though, what they really have a lot of are tags. That comm tags like nobody's business; I am in awe. Unfortunately, DW limits them to a certain maximum number of unique tags and in March they were watching that limit approaching fast. WHAT TO DO, WHAT TO DO?
Well actually,
they came here, and asked us. AHAHAHAHAHAHAHA. I so love it when we, as a comm, appear to be useful.
Their idea was to index all the posts in Delicious, since it doesn't have the same overall tag limit, and hook up each DW post to the tags they assigned to it in Delicious. Their questions, not surprisingly, were:
1. a) What kind of scripts are recommended for a mass-transfer between LiveJournal and/or Dreamwidth to del.icio.us?
1. b) Do such things even exist?
2. Is there any sort of code one could use for LJ/DW to automatically recognize the tags applied to a particular post from del.icio.us?
My answers were none, no, and a lot of top-of-my-head babble that boiled down to sort of but not really. But since I refuse to be cowed by technology, I said ONWARD and began to write a bunch of code to make this project happen. I mean, I couldn't just let posts go insufficiently tagged, that's totally against my life tagging philosophy.
The Process
- Get all the DW posts into Delicious.
With over 7,000 links to post to Delicious, I needed a robust script capable of handling connectivity interruptions, since it was guaranteed that this script would take hours to run. Since I am lazy, I also wanted to avoid writing this script from scratch myself. Lucky for me, ljmigrate, a python script that can backup and migrate Livejournal-based journals and comms, is open source.
I altered the ljmigrate code to create ljmigratetodel, a python script that uses ljmigrate's archiving functions to fetch a comm's entries but then posts them one at a time to Delicious, instead of to another journal. The code needs to be installed on a machine that runs Python, and the user needs to have an LJ account with membership access to the comm and also access to the Delicious account that the entries will be posted to. To index 7,000+ scans_daily entries, the script took about 14 hours of runtime.
The script is not without bugs -- in the process of migrating the scans_daily entries, it failed to post 12 links, but only reported 11 failures, and finding the missing one was a non-trivial problem, let me tell you. It also could use some feature enhancements, like support for the new Yahoo ID Del accounts. But if you're thinking about trying to index a large LJ or DW comm in Delicious, this is possibly the only tool in existence right now, so drop me a line and I'll hook you up with the program and help you through the process.
- Write an offsite html page with javascript that can lookup a DW post in the scans_daily Delicious account and display its tags.
So now all the links were in Delicious! TIME TO CELEBRATE! Except actually no, because there was still the second half of the project to do, that whole bit about linking each DW post to its shiny new tags in the scans_daily Delicious account.
Delicious doesn't provide a feed to get a single account's tags for a specific URL, but you can get a user's feed of links and narrow it by tag, grabbing a maxiumum of 100 links at a time. The trick here is to use the digit portion of Dreamwidth entry URLs as Delicious tags. So if I have a DW entry at http://scans-daily.dreamwidth.org/2237409.html, I can tag it with xref:2237409, and then this feed will give me the scans_daily Delicious info for that URL: http://feeds.delicious.com/v1/json/scans_daily/xref:2237409
Having one unique tag per DW post is way overkill though, and leads to slow load times on Delicious and an impossible bundle management page. Luckily, since DW can be depended upon to only assign about 40 entries to every 10000 possible ids, you can tag with the approximate entry id by cutting off the last four digits. Now to get the Delicious info for http://scans-daily.dreamwidth.org/2237409.html, I use this feed:
http://feeds.delicious.com/v1/json/scans_daily/xref:2230000?count=100
This returns all 38 posts with entry ids that fall within the 2230000 - 2239999 range, and from those I can pull out the info on 2237409.html specifically.
So the html page I made accepts a DW entry id as a URL parameter, then it replaces the last four digits of that id with zeroes to compute the correct xref tag. It uses the xref tag's feed from the scans_daily Delicous account to grab the 40 or so links in that entry id range and uses javascript to filter, pulling out only the info pertinent to the initial passed-in entry id. Then the page displays the scans_daily Delicious bookmark for that entry. See it in action, looking up the tags for 2237409.html: http://murklins.talkoncorners.net/scans_daily/delicioustags.html?entryid=2237409
Since I had this scheme in mind from the outset of the project, I added the xref tags during the migration of the DW posts to Delicious, so they were all set to go. To keep the system working, the scans_daily taggers just have to remember to add an xref tag to each link as they post it to Delicious.
- Implement a custom S2 style that links each DW post to the offsite page.
This was an easy part, and the last official part of the project. At this stage, we have all the entries tagged in Delicious and we have a page that, given a DW entry id, can display the scans_daily Delicious info for that entry. All that remains is to provide a link from each DW entry to that html lookup page, so that people reading scans_daily entries can easily see how the comm's tag wranglers have tagged them in Delicious.
Here is the S2 code I used in the scans_daily custom Tabula Rasa (the base DW style) layout, with my additions to the base function highlighted and bolded:
function Page::print_entry(Entry e)
{
## For most styles, this will be overridden by FriendsPage::print_entry and such.
$e->print_wrapper_start();
"""
\n""";
$e->print_subject();
$e->print_metatypes();
$e->print_time();
"""\n""";
"""
\n""";
"""
\n""";
"""
\n""";
$e->print_userpic();
$e->print_poster();
$e->print_text();
$e->print_metadata();
"""\n""";
"""\n""";
"""\n""";
"""
\n""";
"""
\n""";
$e->print_tags();
var string[] urlparts = $e.permalink_url->split("/");
var string urlid = $urlparts[size $urlparts - 1];
$urlid = $urlid->substr(0, (size $urlid) - 5);
if (not($this isa FriendsPage)) {
"""
""";
"""
""";
"""View our Delicious tags for this post""";
"""""";
}
$e->print_management_links();
if ($this isa EntryPage) {
"""""";
$e->print_interaction_links("topcomment");
$this->print_reply_container({ "target" => "topcomment" });
"""""";
}
else {
$e->print_interaction_links();
}
"\n\n";
$e->print_wrapper_end();
}
It adds to every entry a link to 'View our Delicious tags for this post' which points to the offsite html page. It includes the entry's id as a URL parameter, so that the page is able to look up the entry.
With this code in place, each comm entry gets linked automatically to the offsite page, which is important because even if mods wanted to manually add the link to each entry, they are not able to because only post authors can edit the content of posts.
Anyway, the point is, MISSION ACCOMPLISHED.
- Write a Greasemonkey script to inject the scans_daily Delicious tags into the DW posts.
But I still wanted to do one more thing, which was provide tools for avid comm readers to inject the Delicious tags right into the comm, if they so desired. Greasemonkey is designed for stuff like this -- taking information from one site and sticking it into another site, minimizing the need to hop between pages.
Once the script is installed in your browser, you no longer need to click the 'View our Delicious tags for this post' link to see the Del tags -- they just show up automatically:
As you can see, there are also DW tags assigned to the post. They are set by the post authors and used as suggestions for the Delicious tag wranglers, who often do a more detailed tagging job.
You can download the tag-injecting script from my greasemonkey page. There's a version for Greasemonkey, the Firefox addon, and also one for Chrome.