Comments | siderea: [tech, lj] Distributing LJ

siderea

[tech, lj] Distributing LJ

Aug 07, 2007 00:44

Ever since the Strikethrough of '07 -- actually, ever since I realized that LJ was something of an attractive nuisance of basket in which to store eggs, way back when -- I've been thinking about how one would go about turning LJ, the software, from a client/server model to a peer-to-peer model. That is, how to make LJ distributed ( Read more... )

tech, lj

Leave a comment

Back to all threads

sethg_prime August 8 2007, 19:29:01 UTC

It seems to me that the trick to setting up LJdist would be:

(1) Extend some existing blog software (LJ, WordPress, whatever) so that the author of a blog posting can declare that only certain users--where OpenID is used to determine a user's identity--can read the posting, and that the filtering of what postings a user sees happens not just for the regular Web page, but for the RSS/Atom feed as well. (I.e., if you turn your browser to sidereasjournal.com/atom.xml, you get redirected to an OpenID login page, and you can only see the actual XML after you log in.)

(2) Extend some existing blog-aggregator software (the kind that runs on the desktop, rather than Google Reader or Bloglines, for obvious reasons--Sage, Liferea, whatever) so that you can log into it with OpenID and then it will pass along your OpenID credentials whenever it crawls your list of feeds.

(3) (Bonus!) Do the same thing for some servers and clients that support the Atom Publishing ProtocolFramed in these terms, I think the project is do-able, although I, like ( ... )

siderea August 8 2007, 20:07:01 UTC

Why wouldn't you use the built-in LJ aggregator, if you're already using LJ's code?

To make a desktop aggregator work like LJ's you would need to extend it to handle cuts and it would have to share ACLs with your site, anyway. (Filters == ACLS on LJ, after all. Now, it could be seen as a feature to break away from LJ's model on that, but if the point of the exercise is to give people independent LJs....)

sethg_prime August 8 2007, 20:42:20 UTC

I might, but not necessarily.

I've never looked at the LJ code base so I don't know how hard it would be to change from a username-based model to an OpenID-based model for ACLs.

As a proof of concept, it would probably be easier to add OpenID-based filtering to blosxom than LJ, because blosxom is built around a simple Perl CGI script.

siderea August 8 2007, 20:54:56 UTC

As a proof of concept, it would probably be easier to add OpenID-based filtering to blosxom than LJ, because blosxom is built around a simple Perl CGI script.

You really think so? But then you have to write all the infrastructure for managing those ACLs, which LJ already has. Likewise, I haven't seen the code, and maybe it is all complete spaghetti....

sethg_prime August 9 2007, 02:30:28 UTC

blosxom manages infrastructure the old-fashioned way--you have a Perl script for each plugin, and there are configuration variables set at the top of the script, and if you don't like the way the script works by default you edit the source code.

Unfortunately, comment filtering in blosxom is also done the old-fashioned way--there's a text file full of comments for each article, and if you want to delete a comment you just edit the text file and, umm, hope that nobody you care about tries to add a comment while you're editing. Unsurprisingly, blosxom blogs seem particularly vulnerable to comment spam, which is why I don't use blosxom any more. But I digress.

Anyway, I hope that whatever protocol is used for LJdist servers to communicate with one another can also be adopted by other blog engines.

siderea August 9 2007, 02:36:24 UTC

Well, it wouldn't be a "protocol", at least not in the sense of a new one, I don't think. OpenID, RSS, and maybe some SOAP. It's not a protocol-design problem, is it? It's an implementation problem, I think.

eichin August 9 2007, 03:18:35 UTC

(eww SOAP :-) OpenID, RSS/Atom (Atom Pub in particular), and maybe OPML should take care of a lot of it. The one "protocol" issue is having a friends-page precomputed - you have to let "your" ljdist server authenticate as you to all of your friends sites. That's probably ok; you could avoid it by not precomputing and just having the fetching be done via your browser when you go look at the friends page.

Other than that one case, which is really a separable "application", is there any reason the servers would talk to each other at all?

siderea August 9 2007, 03:27:48 UTC

The one "protocol" issue is having a friends-page precomputed - you have to let "your" ljdist server authenticate as you to all of your friends sites. That's probably ok; you could avoid it by not precomputing and just having the fetching be done via your browser when you go look at the friends page.

Yeah, I've been thinking about exactly that problem, and what seems to me to be a related problem with RSS polling, which is that it's profligate of resources.

Why not make the servers *push* instead of pull? Instead of using RSS to badger all the other sites you're subscribed to, why not make LJdist "publish" on write to "subscribed" LJdist peers. So when I hit "Post" on my LJdist, all the LJdists I've friended get a HTTP GET notifying them to update. Or even an HTTP POST, "here's your copy of my post". We'd be sticking all those posts into a db table anyway, right?

eichin August 9 2007, 04:15:10 UTC

There's a bunch of history for that kind of thing; most of it degenerated into fights about the power the ping-multiplexers got (especially when they got expensive to run and looked for funding and such...)

It's been suggested that this is something Jabber/XMPP is suited for, but you still need servers for that (on the other hand, LJ includes a jabber server, so LJdist could too, and might actually be the right way to express that.) Thinking through the overhead involved... doing GET-based pings probably ends up cheaper :-)

Also, it turns out that with proper use of ETAG/If-Modified-Since, rss-polling isn't *that* expensive (though it will be more expensive than what singleton-LJ does now which is purely internal.)

jducoeur August 13 2007, 23:02:34 UTC

In principle it's reasonable; in practice, it's not how people tend to think when they're coding this kind of thing. Outbound callbacks like this are surprisingly rare in the distributed-systems world, even now: I'm seeing them starting to come in, but they're still the exception rather than the rule.

There are some practical issues as well, of course. For instance, it turns a synchronous process (posting an entry) into an asynchronous one (posting and *publishing* an entry), because the amount of time required to push it out to all subscribers is essentially arbitrary. (Since you can't count on the response time of those HTTP POSTs to other peoples' servers.) So it's just plain more work to code it, probably involving a swarm of background worker threads publishing the entries that have been updated. Not rocket science, but not trivial...

eichin August 11 2007, 17:17:18 UTC

Other people are talking about Decentralized Social Networking now too...

siderea August 11 2007, 18:55:36 UTC

It's such an incredibly obvious idea I'm more surprized that it's not a done deal yet.

eichin August 11 2007, 22:10:44 UTC

As he points out at the end of the post - it's hard to monetize :-) It's user-interface heavy *and* pure-open-source which is a difficult corner to be in, unless there's enough motivation/pressure (LJ existing and being "good enough" has certainly kept my attention off the problem) it's going to be hard to get anywhere on it. Still, there's enough talk about it in enough places that actually doing something could get enough attention...

jducoeur August 13 2007, 22:58:13 UTC

I dunno. It's *possible* that the LJ code is clean enough to simply distribute like this. But honestly, I'm skeptical. Moving from centralized to P2P is difficult for almost any project -- I've done it, and it's often pretty hellish, because you don't really *design* the systems the same. A distributed system *thinks* in terms of protocols; a centralized one, in terms of database accesses. The centralized system usually does things synchronously because it's much easier; the decentralized one, necessarily asynchronously because otherwise it's unusably slow. Moving from the one to the other without utterly destroying your scalability is hard; as often as not, it's easier to throw it all out and start over.

Hence, I've always assumed that an LJdist-like system would be rewritten from scratch for that purpose. I might be wrong, but I'd be a little surprised...

Back to all threads