I understand just now the deepest suck

Jul 11, 2005 11:58

Now I see why people say "Perl's Unicode support blows."

It handles Unicode strings as a string of potentially multibyte characters, so stuff like substr() should see something like 'é' as one character.

Except that it includes in its list of characters character modifiers like the one for "previous character has an accent," so 'é' is still two characters: 'e' and '<-accent'.

It looks like getting substr() to work correctly involves an iterative process for both numeric arguments, offset and length: start with your 'base' offset, find any combining characters in that, bump your offset by the number of them, find any combining characters in that bump, bump your offset... and then repeat for the length of the substring.

They say that because it is really, really bad.

work, web development, programming

Previous post Next post
Up