Does anyone have an idea of how to best convert CCITT image data (extracted from a PDF file) to something usable?
My initial plan was to just extract them, open them in GIMP, and save them as B/W PNGs, but GIMP can't read the raw CCITT data (shame, shame). However, CCITT is one of the compression schemes used in TIFF files, so why not just take the raw data and stuff it into a TIFF container?
The most natural tool for that is likely fax2tiff from
libtiff's tools. Unfortunately, this doesn't work, inexplicably: input.ccitt: Not enough memory. Great.
Web-searching then found
this blog post talking about the same matters and sharing a snippet of Java code (cue Indiana Jones: why did it have to be Java?) to do the conversion. But somehow code like this:
for(int i=0;i 0)
does not inspire confidence. (In fact I dare say that's badly-formatted rubbish, and I have no real desire to untangle it.)
But the basic idea is sound, and I have a copy of the
Encyclopedia of Graphics File Formats (the first edition) lying around, so why not just cast that crufty Java code aside and write my own? Not a bad idea, but TIFF is a baroque and complex format, and you'd still need to look at the PDF various image parameters. It'd be possible, but it'd be also be more work than I'm willing to do.
There's other options still. Although GIMP can't read raw CCITT data, it can import PDFs; unfortunately even without antialiasing you end up with a grayscale image that doesn't match the original B/W image. Why? I wish I knew.
The last thing I can think of would be screenshotting the (correctly-rendered) PDF and working with the screenshot in GIMP. Doable, but it feels like a rather impure solution. So, any ideas?