Comments | ghewgill: kindle 3 and chinese text

ghewgill

kindle 3 and chinese text

Nov 09, 2010 22:50

A few weeks ago I started a beginner level Chinese (Mandarin) evening class at the local community college. We're about four weeks in and I'm really enjoying it, will definitely do more.

I thought it would be an interesting project to load a Chinese-English dictionary onto my Kindle for reference. I've already played with kindlegen, which takes a ( Read more... )

Leave a comment

Back to all threads

goulo November 10 2010, 17:58:58 UTC

Thanks for the explanation! That is helpful, but also leaves me confused. I always thought (perhaps quite erroneously) that at its core, Unicode was not so much about mere appearance as about the logical function/purpose of the characters. The zillions of different possible appearances of the letters "a" (depending on which typeface one uses) are all still the same character, surely. I cannot imagine the disaster if every distinct "a" from every typeface (new ones being created daily) should be considered a distinct character...

I agree it's blurry though, e.g. there are Unicode symbols for "arrow pointing right" or whatever which have some inherent definitional restrictions on how they could reasonably look.

But thinking about this Han unification thing (about which I know virtually nothing but what I've read here), if it's analogous to the different appearances of "a" in different typefaces, then again, I don't see what the problem was in Greg's Kindle. It would be as if "a" showed up in Times Roman but not in Arial - we'd say "WTF, that font is broken, it doesn't have all the characters it should have!" :) Sure, you can't expect every font to have every character, but when they're kind of related in that way. E.g. a Polish font still has "v" and "q" even though the Polish language doesn't use those letters. Ah well, I can see it's one of those annoying "the real world language situation is too chaotic and messy to be captured in an elegant mathematical model" type deals. :)

mskala November 10 2010, 18:34:28 UTC

Unicode was not so much about mere appearance as about the logical function/purpose of the characters. - Yes, and that's why they did Han unification; to give the same numbers to characters they decided were "the same" in some important way even if they looked different in different languages. The consequence is that if you use a font designed for one language to write another (within CJK), it ends up looking wrong even if it has glyphs for all the necessary code points; then there's the separate issue of whether it DOES have glyphs for all the necessary code points.

It sounds like Greg's Kindle, for whatever reason, was looking for a Japanese font first, and then when that font didn't have glyphs for some of the code points in the Chinese text, it was failing to substitute them from some other font but just showing missing characters. I don't know why; he proposes it's to avoid mixing styles. Another possibility might be that it (or that particular piece of software) simply doesn't have font substitution implemented at all, as a matter of corner-cutting rather than intelligent design.

The number of characters used by Chinese and not Japanese is in the thousands, so we can't really expect a font intended for use with Japanese (which will contain the Japanese styles of the shared characters, and thus look wrong for Chinese whether it has full character coverage or not) to also include all the Chinese characters just for completeness; it's a much taller order than hoping for a Polish font to contain "v' and "q."

Back to all threads