lots and lots of technical details about the move \o/

Dec 28, 2009 00:39

Before I get into the promised technical details, I wanted to mention that the single biggest change that has happened in this move is that I am no longer the only one working on the Yuletide code -- the credit for making this happen belongs to an enormous group of volunteer fans who have all pitched in: it takes a village, and wow you guys, do we have a freaking awesome village. ♥

While there is a more detailed post of kudos coming so you all get to know the individual people involved instead of it just being the mysterious wizards behind the curtain, I will here say a huge and general thank you to the entire Accessibility Design & Technology and Systems committees and the OTW Coders and Support and Tag Wranglers teams, who didn't so much go above and beyond the call of duty as passed it, achieved escape velocity, and left it several light-years back in the rear-view mirror.

I will also take this moment to throw in a plug -- we are always looking for more people to pitch in at the OTW, with all our projects, and we have an infrastructure that makes it easy to pitch in. If you love Yuletide, or fanfic in general, and would like to help make the AO3 even more awesome to use, please drop a line to volunteers@transformativeworks.org or fill out our contact form!

In particular, bringing Yuletide in is a dry run for one of our major goals, which is to make the AO3 a shelter for archives that are otherwise going to fall down and go boom. The Open Doors project is all about saving these archives and helping archivists back them up into the AO3. So if there is an archive you would like to help save, please come on board!

And now here is the nice big helping of geekery, which I will cut:


The old AutomatedArchive code was written originally in 1998 in Perl by yours truly when I didn't know what the hell I was doing. *g* It does one very smart thing: the story files are static HTML, which makes them really fast to load and hard to break (so that even if when the archive database got messed up, direct links to the stories would still work). The downside to this is that there was no easy way to include dynamic information, meaning information about the story that would change over time -- for instance, comments on the story -- and it made it harder to let people edit the stories, their contact information, etc etc.

The main BAD thing it did was use a plain text file as the database. This was a fine solution for the halcyon days when an archive of a thousand stories seemed HUGE. However, as the db got bigger, two big problems arose: searching got horribly slower (which is why the quicksearches had to be turned into static html pages themselves instead of being search engine links), and even worse, in order to add a new story, the code had to load up the whole file, add the new story, and write it all out again.

What was crashing Yuletide so much the last couple of years is basically, two or more people would try to add stories at the same time, the second entry would blammo the db, and I would have to go in and fix it by hand manually. The slowness of adding records is directly related to the size of the db, meaning that adding the first story this year would have been as slow as adding the last story was last year and worse after that, so by the rush, it would have been unmanageable and crashing virtually every few minutes.

Similarly, generating the quicksearch pages was dying constantly because the searches ran so slowly. They had to be hand-regenerated over and over and often ended up out of date. Those searches were going to all be even slower this year due to the larger db, if they would run at all.

This is the number one reason why Yuletide could not happen on the old code this year -- we just were too big to use a flat-file database anymore.

The new code is written in Ruby on Rails, with a MySQL database behind it. What all those things mean:
  • MySQL is a relational database management system, which among other important things can insert a new item in a constant amount of time regardless of how big the db is (so it doesn't matter if there are ten stories or ten thousand already in the db). It can also index the data in a bunch of different ways for really fast searching even when there are huge numbers of stories.

  • Ruby is a really nice new programming language that is cleaner and more human-readable than Perl, with lots less reliance on punctuation -- this may sound kind of trivial, but it's of huge importance for maintainability -- and which is object-oriented from the ground up.

  • Ruby on Rails (RoR) is a development system built in Ruby that's intended for building web applications using a database back end. It enforces a lot of good habits, as well as just doing a lot of boring repetitive work for the programmer. It relies on the model-view-controller software design pattern, which basically means it splits the front end (the "view") away from the database/back end (the "model", which represents the data in the db and the stuff you can do with it), with the controller as the glue in between. This is a very good idea because it makes your applications a lot more robust and easier to maintain.

To give you a sense of the advantages that we get from using RoR, the old Yuletide software is about 20,000 lines of code. The new code that I wrote to make Yuletide run on the AO3 was about 1000 lines of code. The entire AO3 code (which has PILES more functionality than the old AutomatedArchive software) is 10,000 lines of code.

As you can imagine, this makes a huge difference in terms of how much time it takes to create and edit applications -- it's like the difference between writing or betaing a story that's 1000 words long vs one that's 20,000 words long.

What this means to you the end user as a practical matter is, we can code up some complex things really fast. As an example, just in the last few days after we first opened the archive for reading and performance got really slow, sidra could look at the logs and say "hey, we are really slow over here," and then elz was able to go directly to the slowest bits and make MASSIVE performance improvements with relatively a small amount of code. ♥

It also helps make RoR apps easy to contribute to. It doesn't take a long time or a lot of training and requires zero programming experience to understand the basics of how our code works or to make fixes. This means that there is a low barrier to entry for new coders, and you can have a big team supporting one another and passing the code along, instead of one solitary coder working in a corner, which means more stability. (YES THIS IS A HINT COME JOIN USSSS.)

The other fundamental technical change in the move to the AO3 is that all of the stories are now served up dynamically. That makes individual stories slower to serve than the old static files, but the upside in terms of new features is huge: stories can display differently from moment to moment, depending on new information that has come in (eg, new comments, new bookmarks, edits) or depending on who is looking at them (eg, if you have a preference set that you don't want to see warnings, if you want to lock a story only to people who are registered users of the archive, if you are the author of a story that is currently hidden/anonymous, etc).

This makes maintaining good performance a challenge, but as I noted above, this problem is also much easier to attack as a problem in the new code. It is also going to be a lot easier to grow, because RoR applications are designed to be spread across multiple servers. By next Yuletide, we'll be running the AO3 off at least two machines, if not even more as our funds allow. (YES THIS IS ANOTHER HINT, SUPPORT THE OTW!)

And, that is basically the big summary of why the move, and what we've gained. if you have questions, hit me! though I may be a bit slow answering as I am busy reading piles and piles of stories \o/

tech, yuletide 2009, archive

Previous post Next post
Up