Apache and hardcore content negotiation: justben

justben

Apache and hardcore content negotiation

Oct 02, 2006 23:00

Today at work, near the end of the day, I was spending a bit of brainpower figuring out how sehellenes.org should serve up its content. The site news lives in XML files. Because of the nature of the data, it's sensible to view it a particular as raw XML Atom entries or transform it into XHTML. I may even want to add a way to transform it into RSS data later, but I haven't decided on that for sure.

The way I serve it up right now is wrong. Right now I have two different URIs: one for the raw Atom entries, and one for the XHTML format. I use a really grody combination of symlinks and apache Location directives to make both URIs serve up the same data in different formats. That is all kinds of wrong. The W3C's Architecture of the World Wide Web document and TimBL's Cool URIs Don't Change (and a handful of other documents I don't have bookmarked) offer lucid explanations of why the data should have only a single URI. In short, what it boils down to is that HTTP Content Negotiation was engineered specifically to address the exact situation described here: when one piece of information (a site news article for me) needs to be accessed in two or more different formats (Atom XML and XHTML for me).

And apache httpd supports content negotiation. Unfortunately the straightforward stuff that everybody (apparently) wants to do is let apache select between, for instance, two different image formats. Or maybe between HTML and CGI. The key is that in the standard implementation, each has a separate representation available on the filesystem, and apache easily negotiates which one to return to the user. The problem that I've got is that I have one representation on the filesystem. I want apache to make two different versions available, I want it to magically handle the negotiation for me, and I want at least one of those presented versions to be generated by a CGI script using the source XML as input.

Now, if I just wanted to generate XHTML from raw XML using a CGI script, I could do it with apache's URL Rewriting support. This appears to be a fairly common, straightforward thing to do. Unfortunately, no, I want to be difficult. I need content negotiation and CGI-generated data to work together in this situation, and ideally I'd like the resultant solution to be more elegant than the nasty symlinks-and-Location-directives hack that I've got wedged together right now.

I haven't finished exploring how to solve this problem yet. I'm not at the end of my rope. Honestly I haven't yet dug too deeply into it. Still, if anyone reading this happens to know someone who's already done it and might be able to offer some pointers, I'd greatly appreciate the shortcut to knowledge.

And if not? Well, I'll post my findings here as I'm able.

(LJ Spellchecker Genius of the Day: symlinks -> Somalians)

tech, geek, web