LiveJournal is shrinking. In new faux-academic style!

Aug 05, 2007 21:06


(If you're on my friends list, you've seen most of this already.)

LiveJournal is shrinking, at least in terms of active accounts. The number of active accounts reached its peak in April of 2005, and has been decreasing ever since then. Let's take a look at the graphs.


Figure 1.


Figure 2.


Figure 3.


Figure 4.

It may save you some scrolling back and forth to open the gallery of these images in another tab or window or something.
Methodology.

The total accounts data comes from stats.txt, the rest was scraped from archive.org's archive of stats.bml.

(This section becomes increasingly boring after this point.)

Archive.org's archiving of stats.bml is sort of sporadic: years 2000 through 2003 are sampled 4 or 5 times a year, but then 2004 and 2005 are samples more than 100 times during the year, sometimes more than once per day. The data obtained from archive.org was cleaned to remove repeated data (from when stats.bml hadn't updated between snapshots) and other bogus data.

Lines between data points are meant only to be a guide for the eye, not represent what the actual data there should be. There is one fake data point for everything but total accounts inserted at the day when the invite-only period ended (December 12, 2003). This was done because the data points near this day are fairly far apart, and a linear approximation overestimates the increase before that date. This point was determined by assuming all other quantities grow before that day at the same rate that total accounts grows. Again, this point should only be taken as a guide for the eye. Data points aren't shown because they'd run together and it wouldn't actually be helpful. There's a link to the data set at the end of this post, for anyone who wants to see the data points themselves.

The "Active accounts" number was introduced on February 26, 2003 (specifically, in revision 1.25 of stats.pl in the LJ source code.) Obviously, there is no data for "Active accounts" before then. The way stats.pl finds that number was changed in revisions 1.46 and 1.47 on December 14, 2005. The numbers it now comes up with are much smaller than those it came with before. The data from before revision 1.46 is referred to as "old calculation" and after is referred to as "new calculation." It is possible that revision 1.29 changed the number that stats.pl comes up with, but i can't tell from looking at it.
Meaning of quantities.
  • Total accounts is the total number of accounts that have ever been created on LiveJournal. This include deleted accounts. Equivalently, this is the user number of the most recently created user.
  • Accounts that have ever been updated is pretty much what it sounds like. I'm not sure if this includes accounts that updated before they were deleted. (Maybe someone more familiar with the LJ codebase can answer that.)
  • Active accounts, last 30 days is the number of accounts that have had anything happen to them in the database in the past 30 days, such as updating or uploading a userpic or changing their profile etc. The way this was calculated was changed on December 14, 2005 as described in the previous section. The numbers it comes up with now are much lower.
  • Updated last... is another straightforward one. It should be noted that short-term variations in "updated last day" don't really mean much, since people tend to post more on certain days of the week. This is somewhat true for "updated last 7 days" too. (There tends to be a slight downturn in "updated last 7 days" at the end of every year near Christmas and New Years, presumably due to people being busy with holiday-related things then.)

I have included the dates of three significant dates for LJ. The invite-only period had an obvious effect on LJ's population growth. The inclusion of the date of 6A's purchase of LJ is included for curiosity's sake, and shouldn't necessarily be taken as suggesting or not suggesting that 6A's purchase of LJ determines the population growth/non-growth of the following period.

I'm going to implicitly make the assumption from here on that proportional changes to any of the activity measures correlate to proportional changes in number of users. I don't think this is unreasonable, but I also don't think it's necessarily an extremely strong correlation.
Discussion

LiveJournal experienced a near-continuous trend of growth in the number of active accounts, neglecting small bumps, until April of 2005. All measures of account activity peaked at various points within the month of April. There has been a consistent decreasing trend since April of 2005.
NumberDateValue at peak Active accounts last 30 days (by old calculation)April 22 20052662552 Accounts updated in the last 30 daysApril 14 20051534960 Accounts updated in the last 7 daysApril 5 2005 985495 Accounts updated in the last dayApril 5, 2005 380810
Table 1. Days of the peaks of activities and their values there. Not that, due to our not having a continuous data set for these numbers, the actual peaks might have been a little higher and at a nearby date.
The current numbers, at the time that I'm writing this, of accounts updating in the last 30 days is 985313. This mean that there has been a 36% decrease in the number of accounts updating in the last 30 days since the peak, or a decrease of 11% per year. Unfortunately, comparing the active accounts number is sort of apples-to-oranges because of the change made to how they calculate the active accounts.

This decrease raises the question; how can the total number of accounts on LJ be growing while the number of active accounts shrinks? What is probably going on is a balance of users, e.g. new users replacing those who leave. Let's examine the rate of account creation. We get a very clear picture of this from stats.txt.


Figure 5. Account creation versus time. The marks indicating the beginning and end of the invite-only period and LJ's purchase are in the same places as before.
There are some interesting peaks in this graph. Once the invite-only period ends there's a huge spike in the account creation rate. (14842 new users the day LJ opened up, and a whopping spike of 17630 new users/day on December 14, 2003! yow!) There is another peak on July 19, 2004 of 13609 new users/day, and one on January 16th, 2005, right after the 6A purchase, of 14175 new users/day, preceding a long downwards trend. There's another spike at 12088 on February 22, 2006, and then the new user rate seems to stay fairly constant.

Interestingly, quite a few of these spikes have corresponding features on the activity graphs Figure 3 and Figure 4. Both "active accounts" and "updated last 30 days" have small bumps a month or so after the spikes in user creation. The post-6A downwards trend is reflected in the activity graph, though delayed by a few months. It is worth remembering that decreases in the number of users updating will not show up in "updated last 30 days" or "active accounts" since they, well, consider every account that's updated or has activity in the last 30 days, but even considering that these features still seem to lag behind some. There are other features that probably correlate. (There's a big drop in November of 2006 corresponding to the power outage then.)
Conclusions.

First and foremost; It is completely wrong to say that LJ has 13 million users. LJ only has a larger population than the state of Illinois, as claimed here, if you're running things like an old-style Chicago election and are counting the deceased and people who don't live there anymore. LJ's actual population, according to active accounts, has probably never been much higher than 2.6 million, and is currently near 1.7 million, which is still a decent number. It's more than the population of Phoenix, Arizona or Montreal.

We've (I've been trying to avoid the academic we this entire post) fully established that LJ is shrinking in terms of account activity. This suggests LJ is shrinking in general, but there are other aspects of LJ that ought to be analyzed before we can say, full-out, that LJ is shrinking. The two main aspects I'm thinking of are posts per day and number of paid or permanent accounts. For the former, it is entirely possible that today's breed of LJers, though less numerous, post more. For the second, it could be argued that the ones who would pay for LJ are the really important users (certainly from 6A's perspective). These are two quantities that used to be on stats.bml, but posts per day was removed in 2003 when it became too big of a burden on the servers, and the amount of users paying for LJ was removed near the end of 2005 for unstated reasons. There are all sorts of interesting data one could get with unlimited access to the database, which is something people who aren't employed by 6A don't have, and getting these numbers by a random method would probably be interpreted as a DoS attack on the servers.

The users who are leaving are either giving up on LJ and blogs entirely or going to other services. Or, rarely, dying or falling into comas. That's a tautology; Call it the Law of Conservation of Users. I remember there being an entry in Anil Dash's blog at some point which had a clustermap of the "blogosphere" (gag) where LJ was off on it's own little glob, so LJ's fairly insular. I was wondering if they might have went off to other sites using the LJ software, so I counted, and GreatestJournal, DeadJournal, InsaneJournal, and JournalFen combined have 59735 accounts that have updated in the past 30 days. That's 3.9% of LJ's peak, and 11% of all of the decrease from the peak to now. Even if every one of those services had a userbase consisting entirely of people who left LJ after April of 2004, that doesn't account for everyone who has left. Other blogging sites aren't nearly as straightforward with their statistics; Vox has none at all, and I can't find any on Blogger or whatnot either. In general, this is the kind of data that people pay market analysts to come up with, and would probably require a full analysis of the field of blogging/social networking software and websites. While I seem to have enough free time to write this mess, I don't have enough free time to do 6A's market research for it.

Though some people would like to jump on the chance to blame 6A, it is premature to say that 6A has been causing the decrease in number of active accounts. The data I have is suggestive, but when it comes down to it, the suggestion is weak. Off the top of my head, I can think of a bunch of other reasons, such as a net-wide decrease in the popularity of casual blogging, or previously active LJ users growing up and getting jobs or going to grad school or doing other things that prevent them from LJing.

I'm afraid that I'm unable to answer the question, "Why is LJ shrinking?" with the data I have available to me. This is, really, one of those questions that will never have a full answer. A more meaningful question to ask might be "How do we make LJ stop shrinking?" which is a question that 6A ought to answer and implement. I think that, while LJ is still of a healthy size, having a declining user base is not good for it.

My next project is going to be to wring whatever other information i can out of the archive.org stats.bml. Gender distribution versus time, age distribution versus time. The data here seemed like the most important part, though, so i did that first.
Acknowledgments.

Thanks to everyone on my friends list who commented on this when it was in its formative stages. Thanks to Brad Fitzpatrick for making the LJ source code public so that I could go root around in it for answers. Thanks to archive.org for doing what they do and letting me use their snapshots. Last but not least, thanks to the team that runs LJ for making the user statistics public.

The graphics in this post were made in the software Plot for Mac OS X. The data for this is available in an excel file here, for anyone who wants to fool around with it.

The post i made in the mathematics community relating to this data has some interesting comments.

lj meta

Next post
Up