This post in Krebs on Security describes an unusual and potentially
very dangerous attack technique that can be used to sneak evil code past
code reviews and into the supply chain. Briefly, it allows evildoers to
write code that looks very different to a human and a compiler. It should
probably come as no surprise that it involves
Unicode, the same coding standard
that lets you make blog posts that include inline emoji, or mix text in
English and Arabic.
In particular, it's the latter ability that the vulnerability targets,
specifically Unicode's
"Bidi" algorithm for presenting a mix of left-to-right and
right-to-left text. (Read the Bidi article for details and examples --
I'm not going to try plopping random text in languages I don't know into
the middle of a blog post.)
Now go read the "
Trojan Source
Attacks" website, and the associated
paper [PDF]
and
GitHub
repo. Observe, in particular, the
Warning about bidirectional Unicode text that GitHub now attaches to
files like
this one in C++. Observe also that GitHub does not flag
files that, for example, mix
homoglyphs like "H" (the usual ASCII version) and "Н" (the
similar-looking Cyrillic letter that sounds like "N"; how similar it looks
depends on what font your browser is using). If you're unlucky,
you might have clicked on a URL containing one or more of these, that took
you someplace unexpected and almost certainly malicious.
The Trojan Source attack works by making use of the control characters
U+202B RIGHT-TO-LEFT EMBEDDING (RLE) and U+202A LEFT-TO-RIGHT EMBEDDING
(LRE), which change the base direction explicitly.
And remember: ШYSINAШYG - What You See Is Not Always What You've Got!
Resources
Another fine post from
The Computer Curmudgeon (also at
computer-curmudgeon.com).
Donation buttons in
profile.
[Crossposted from
mdlbear.dreamwidth.org, where it has
comments. You can comment here,
or there with openID, but wouldn't you really rather be on Dreamwidth?]