Leave a comment

Comments 11

va_dev January 28 2012, 20:50:20 UTC
There is an API that you can use: http://www.livejournal.com/doc/server/index.html. This require some scripting/coding. Does this make sense?

Reply

kypeli January 28 2012, 21:06:06 UTC
Thanks for your reply! Does it make sense for my use case? I am not sure :)

If I understood the documentation right, those API calls would let me interface with my own blog entries by authenticating myself first with the server. But I could not find relevant information from that link on how I could take any public blog on livejournal.com and download content from it. But maybe I just didn't understand how to read the docs?

Maybe I missed something?

Reply

va_dev January 28 2012, 21:15:01 UTC
The best way I know is using xmlrpc protocol. There are existing implementations in various programming languages, but you can write your own too. If you look at this page: http://www.livejournal.com/doc/server/ljp.csp.xml-rpc.protocol.html, it lists the methods that can help you for querying anything you need from the journal. In your particular case you can use getevents method in combination with others. The problem is that the number of returned events (entries) per query is limited by 50, however you can fetch all blog entries step by step using the API.

Reply


int January 28 2012, 21:17:30 UTC
You could do it via the LJ protocol and syncitems/getevents, then output it all in whatever format you want. This would mean you'd have to have the username/password of a user to get their items though, which I'm guessing you don't want to do as you mentioned pulling all public items.

Reply

kypeli January 28 2012, 21:37:57 UTC
That is correct. I am interested in analyzing certain (public) blogs and their content but I am not the admin of these blogs.

So basically there isn't really a way to do what I would like to do?

Reply


andy January 29 2012, 07:30:06 UTC
Scraping HTML is the way to do it; LJ is fine with that, assuming your system behaves itself and doesn't create too much strain on the servers. This Perl script used to be able to save a given journal to a set of disk files: http://pastebin.com/1CaVmEij. I haven't checked if it still works, but reviewing it may give you some ideas.

Reply

kypeli January 29 2012, 07:44:58 UTC
Thanks! I was afraid it would go to scraping HTML, but that Perl should be very helpful. Cheers!

Reply

kypeli January 29 2012, 13:02:26 UTC
Thanks again for the Perl script! It worked perfectly!

Reply

andy January 29 2012, 13:08:58 UTC
I'm glad I was able to help!

Reply


Leave a comment

Up