I misspoke, apparently: the Unicode problem in the first tool was stemming from something weird going on with the xmlrpclib.Binary.decode() function. Extracting the raw utf-8 data and decoding that gets me the data I expect. New problem: some of my entries are not fully HTML. The paragraphs are not wrapped in
tags, resulting in a massive blob of
(
Read more... )