For several months now I have been at work on a new version of the Mystery Hunt index web site which I have written about previously. This site is finally live at
http:/devjoe.appspot.com/huntindex/ .
For those who have used the old web site, it will seem familiar. The biggest change is that links in lists of puzzles no longer go directly to the MIT web page associated with the puzzle. Now they go to a separate puzzle page in my web site, which includes the link for the MIT puzzle page and other information about the puzzle. Right now there is not a lot of that other information, but I will eventually be adding more information here. I've made allowances for listing puzzle authors and also including a link to the solution page for each puzzle, as well as a free-form area where I might write any other info about a puzzle. Those two-part puzzles that needed a link to a second puzzle page now have it in this area.
I had planned on doing this for some time, but I had no set timeline for doing it. It was only when Verizon announced they were doing away with FTP access to their personal web space (where my old site was hosted, destroying my ability to update the site since the only access now is through their lame and horribly broken sitebuilder tool) that I started to act on this.
There were two main reasons I wanted to do this. First, the new site has editing features built in (they shouldn't be visible to ordinary users, but I see them), making it much easier for me to update the site. I can update the live site and have changes visible instantly, instead of editing two enormous spreadsheets and then running a script which rebuilt and uploaded 900 files. And second, I wanted to learn how to use Google App Engine with a real project.
The first thing I did was just set up a static copy of the things from my old web site that I needed to keep. There were lots of one-off pages and files and such that just didn't need to stay on the web, and those were dropped. It's perfectly possible to make a Google App Engine site that serves nothing but static files, the way I used my old site, and in that way, even the busiest personal site would not run over their free quotas. Once I was sure I could make this work, I put up modified versions of the pages on my old site most likely to be visited by others, with links pointing people to the corresponding page on my new site. As a result, some of you may have already realized I was moving to Google App Engine. Later I put up pages with meta tags that forward you to the new site.
Meanwhile, I learned how to use the database (Google calls it a datastore) to create a dynamically generated website in Python. I have programmed in Python for a long time. I learned it originally for a challenge issued by another player in the nomic game Ackanomic ca. 1996, and it quickly became my language of choice for all manner of programming. Naturally, with Python offered here, I chose it over the Java and Go options.
Once I figured out the basic ideas, I realized that the total job was complicated enough that I needed to treat it like a real programming project with development plans and everything. I ended up with 15 main items, which I elaborated on as I got to working on each one of them. Items 1-3 were related to the old static site, mostly on pages outside the Hunt index site. Items 4-15 were all to develop the new dynamic Mystery Hunt index web site. Currently, I have completed this list through item 11, reproducing all functionality of the old web site and some new features as mentioned above. You're probably curious about those last 4 items. They are long-term plans.
- 12 is to add pages within my site for whole hunts, so I can replace the list of Hunts on the home page with a link to a page which lists and links to a page for each Hunt.
- 13 is to add author data to my web site. All the functionality is there for it, but I just need to collect the data. An optional part of this item is to add author pages to my site for individual puzzle authors, which would link to all their puzzles.
- 14 is to add solution links to the puzzle pages. Again, the functionality is there and I just need to collect the data.
- 15 is to add other Hunts. Of course, some time after the 2012 Hunt it will get added. The 1999 and 1997 Hunts are mostly complete on the MIT web site, and a few puzzles and solutions from the 1994 and 1996 hunts are online; these could all be added. The hunts before 1994 are also mostly available on the web site, but they are of the variety of the early hunts rather than the more modern ones, and most of their solutions are not posted. But besides this, there are other puzzle hunts that have online archives of their puzzles and solutions that would be suitable for including here. If I include those, I'll have to change the name of the site, but it will be a reasonable expansion of the site.
I had a lot of challenges along the way.
- Security: This was the easiest one. Google App Engine lets you log in using Google accounts, and it was easy to use the built-in functionality. This is effectively non-public now; either you are logged in with an admin account, and can see all the lovely editing features on the site, or you are not logged in, and see none of them. There's a whole other section of the site which requires an admin account to even access, as well as forms built into some of the pages (for admins) which allow direct editing, and the form handlers are all in that admin-only section.
- Backup and Restore: I wrote a standalone script to turn my old spreadsheet files into an XML file which contains all the data in a format suitable for quick loading into my site. I added a way to load such a file into the new web site. And I added a way to download the data again, for backup. One of the checks I performed during development was to verify that I could upload my data into an empty site and then download and reproduce the exact same XML file I started with. Along the way, for the first time since I learned XML in an employer-provided training course back around 2000 or 2001, I actually used CDATA sections in XML. (Some of the fields in my database contain HTML tags, and the way to handle these is not to try to allow for all possible HTML tags as elements of the XML, nor to turn all those s into &-thingies the way HTML does, but instead to wrap it in a CDATA section.)
- Efficiency, Part 1: Ensure that the pages display quickly. The very first version of my site did a datastore query to get the puzzles for each keyword and the name, etc. for each puzzle, which meant that displaying the full index did several thousand datastore lookups and was quite slow. I cached a lot of this data, so the slow query-puzzles-by-keyword queries were replaced by a list of associated puzzles in each keyword which gets updated when I add more keyword-puzzle links.
- Efficiency, Part 2: Even with that, I found that I was going to run into the datastore operation quotas (especially as, during the course of this project, Google App Engine came out of beta and the practically unlimited free quotas were replaced by smaller ones). The solution was to use memcache, and memcache everything. My entire database is only a few megabytes in memcache, and memcache is perfectly designed to handle loads. During busy times, the cache will stay active and the entire site should run out of the cache, limiting datastore operations to only those times when the site goes idle and things expire from cache. And I have to be careful to wipe old stuff out of the cache when I update things. I had some difficult bugs along the way which I ultimately discovered were caused by stale cache.
- Efficiency, Part 3: I wanted to be able to wipe out and re-upload the entire database when necessary, and with the new limit of 50,000 database writes per day, I was failing on that ability. I needed to declare my objects correctly so that Google App Engine didn't create datastore indexes I would not use, but were nevertheless updated with each datastore write. This got a complete write of the database down to 12,000 writes. I think it should be only 9000, so there may be something unfinished here, but the current system is acceptable and allows the database to grow much larger while still satisfying this goal.
- Sorting: After all of the above changes, all lists are stored in their corresponding objects, and the datastore is only used to retrieve data for a particular object when it fails to be found in the cache. These lists are even kept in order, using a binary search to find where to insert a new item in log time. I also wanted the puzzles to all sort naturally, and with this goal in mind, I gave every puzzle a combination round and puzzle number. Where there were numbers provided by the hunt I used them; where there weren't, I had to make something up. The unique IDs used to sort puzzles are the same ones in the URLs for puzzle pages, and are composed of the Hunt year and round and puzzle numbers, with punctuation omitted. And by the time I was through with this, I noticed that the round/puzzle numbers looked a bit ugly in the lists of puzzles, with them being in different formats for some hunts. So I removed them from display in the lists of puzzles. They're still there as the title attribute of each link that shows up as a tooltip in all modern browsers when you move the mouse over a puzzle link, and of course they appear in the puzzle pages.
Along the way, I found and corrected a number of errors in my site. Some puzzles were listed in the wrong round, or in the wrong hunt, or with the wrong title. There were some other weird errors, too. About 1% of the approximately 3000 keyword-puzzle links had some sort of problem.
In the end, I used the W3C's validator to help me catch missing tags that my browser was nevertheless handling OK but might break for some people. I changed a couple things that are deprecated in HTML 4 and got my pages to pass.
The whole thing was done on a different URL while the copy of the static site remained at /huntindex, so at the very end, I moved my new site to /huntindex in place of the static version, set up some redirects for pages from the old site that are at different URLs now, and finally, right around the moment the clock turned to 2012 eastern time, I uploaded this version of the site.