Grief, I've bitten off more than I can chew now...: elfs

elfs

Grief, I've bitten off more than I can chew now...

Apr 03, 2012 19:09

For reasons I'm not going to go into, I need to two different HTML parsers. One needs to accept almost any arbitrary HTML5 input without the concomitant javascript processing, and then spit out a stripped down, whitelist-tags-and-attributes-only version for storage; the other needs to recognize the full suite, plus a completely alien set of tags into which I'll be throwing some, er, extra functionality.

I need this all written in coffeescript.

Nobody's done anything like this before, at least not in Coffeescript. My brain is spinning; I haven't worked with real parsers since my days at F5. Nothing like this was necessary for Isilon or IndieFlix. And, oh my gods, the HTML5 parsing standard is explicit, easy to implement, and huge.

I can use some of the existing Javascript or Python parsers as starting points, but they're not terribly easy to extend. I'd also like to try and use a parser-combinator, because my experience has been that PC grammars are easier to understand. But try as I might, my head explodes when trying to grasp whatever it is I'm trying to do. Still, we'll see. After fridgemagnets, I need a bigger project.

geek