Today we are releasing Trojan Source: Invisible Vulnerabilities, a paper describing cool new tricks for crafting targeted vulnerabilities that are invisible to human code reviewers.
Until now, an adversary wanting to smuggle a vulnerability into software could try inserting an unobtrusive bug in an obscure piece of code. Critical open-source projects such as operating systems depend on human review of all new code to detect malicious contributions by volunteers. So how might wicked code evade human eyes?
We have discovered ways of manipulating the encoding of source code files so that human viewers and compilers see different logic. One particularly pernicious method uses Unicode directionality override characters to display code as an anagram of its true logic. We’ve verified that this attack works against C, C++, C#, JavaScript, Java, Rust, Go, and Python, and suspect that it will work against most other modern languages.
This potentially devastating attack is tracked as CVE-2021-42574, while a related attack that uses homoglyphs – visually similar characters – is tracked as CVE-2021-42694. This work has been under embargo for a 99-day period, giving time for a major coordinated disclosure effort in which many compilers, interpreters, code editors, and repositories have implemented defenses.
This attack was inspired by our recent work on Imperceptible Perturbations, where we use directionality overrides, homoglyphs, and other Unicode features to break the text-based machine learning systems used for toxic content filtering, machine translation, and many other NLP tasks.
More information about the Trojan Source attack can be found at trojansource.codes, and proofs of concept can also be found on GitHub. The full paper can be found here.
There’s now a really good blog post on this by Brian Krebs
There are more stories at The Register, threatpost, SC Magazine, ZDNet, Computer Weekly, Gizmodo, Dark Reading and Hacker News, with its discussion thread.
Here’s the bidi CVE and the homoglyph CVE; and advisories from Github, Red Hat and Rust.
Is this really novel? I recall reading about the homoglyph attack many years ago, in the context of malicious URLs. I thought it was by your very own Markus Kuhn, but I can’t find the reference now…
Malicious URLs is one thing. Sources are different thing, no? And using unicode RTL to fake visible vs actual interpretation is something new to me.
(= ʎɐp ʎddɐɥ puɐ ollǝɥ
Hi
The paper is fifteen pages. Good solid text, No pics.
Old rule of thumb: One typed page of text equals one kilobytes.
The paper is 3.3 megabytes! has something been added?
I just love security!
Paul Harrison has done some neat work on homoglyph detection.
Here’s a video of a talk that Nicholas just gave on Trojan Source.