How to Download Entries

Mar 08, 2004 10:13

Okay. I've seen a bunch of posts in the past where people are talking about downloading entries, and they're totally going about it the wrong way. So, I'm going to hopefully set the record pretty straight and outline various ways of getting entries from the servers and the relative merits of each, and the one I recommend for various reasons.

First off, support Unicode. If you write a client and release it at all, it will be used by people who need Unicode support. LiveJournal has a huge community of users that don't necessarily keep their journal in English. The Russian community is huge, for example, and their journals require Unicode to post/view the entries. Java supports Unicode fairly natively, Delphi users will want to use the WideString class instead of plain old String, and other languages will need you to figure out how best to handle Unicode in your language of choice.

In general, there are four methods of downloading entries with the getevents protocol mode: lastn, syncitems, one, and day. These four methods are specified in the selecttype variable of the getevents call. I will discuss each of these and when to use them.

lastn
This is most effectively used when you are providing the user a snapshot of their recent entries, or when you simply want to get their most recently posted entry to verify that the entry you just posted was posted, or you want to allow the user to edit their most recent entry.

You should not use this mode to download an entire journal. I don't believe you can specify a huge number that would give you their entire journal (unless their journal was a few dozen entries only).

day
This is useful for people who are writing calendars and want to get entries on a day that the user has clicked on. This should be used in conjunction with the getdaycounts protocol mode to figure out when the user has posted and then to get entries on that particular date.

This mode should never be used for enumerating someone's journal and downloading their entries. There is one quirk to this mode that causes me to say that: if, for some reason (non-Unicode client, for example), the server is unable to send you a particular entry, it will instead send you text indicating that the entry's subject and body are "(cannot be shown)". It doesn't TELL you it's done this, so you end up thinking that's the user's real entry and blow away whatever they had.

one
When you want to download a handful of entries scattered about, you can use this mode to get them. It's usually fairly safe to download an entry with this mode and then to resubmit it to the server. Example: you use getdaycounts to show a calendar, then you use the day mode to show entries for that day, then you use this mode to get the real entry for editing.

syncitems
If you are trying to download someone's entire journal, this is the mode to use. This mode is the only way you can account for edits that the user has made to their entries without using your client. This is also the most efficient way of downloading entries, because the server will send you a whole bunch at a time (100 last I checked). This mode is used in conjunction with the appropriately titled syncitems client protocol mode.

NOW! It is time for an example of how to use this mode properly to download someone's entire journal. Alright, let's talk some pseudocode:

send client request "syncitems" with the "lastsync" variable not specified
get list of items back from request, save items into list for processing later
while size_of_list < sync_total {
find most recent time in list
call "syncitems" again, but set "lastsync" to most recent time
push result items onto lost
}
iterate through list and remove items that don't start with "L-" (L means 'log' which is a journal entry)
create hash of journal itemids with data { downloaded => 0, time => whatever sync_X_time was }
while (any item in hash has downloaded == 0) {
find the oldest "time" in this hash for items that have downloaded == 0
decrement this time by one second :P
mark THIS item as downloaded (so we don't use the same time twice and loop forever)
send client request "getevents" with selecttype set to syncitems, lastsync set to oldest time minus 1 second
mark each item you get back as downloaded in your hash
put the entries you got into storage somewhere
}

That's it. You will have to call syncitems and getevents a bunch of times each to get the data you need, but this isn't a problem if you do it smartly. Also note that the server keeps track of the times you use when you call getevents, and if you start specifying the same time repeatedly (infinite loop or something) then your client will be given an error message "Perhaps the client is broken?" or something like that.

I make no warranty as to my pseudocode. I based it off of the Perl code I wrote that downloads entire journals and uploads them to other services, and I haven't had any problems with it. This is also the algorithm that is used in LochJournal's history code that I'm working on. And remember, set ver to 1 or you will have no end of trouble!
Previous post Next post
Up