In this reboot of the Three Paper Thursdays, back after a hiatus of almost eight years, I consider the many different ways in which programs can be sanitised to detect, or mitigated to prevent the use of, the many programmer errors that can introduce security vulerabilities in low-level languages such as C and C++. We first look at a new binary translation technique, before covering the many compiler techniques in the literature, and finally finishing off with my own hardware analysis architecture.
RetroWrite, published at S&P this year, uses rewriting techniques on closed-source binaries to instrument them to detect various types of vulerabilities — sanitising the programs against the many types of programmer error that can appear in low-level languages such as C and C++. It argues that this adds significant power over implementing the instrumentation within the compiler, because it allows analysis of any program running on your system, which may be untrusted and may not allow source-code analysis.
The previous state-of-the-art is using dynamic binary translators/emulators such as QEMU and Valgrind, but these are incredibly slow for large applications. RetroWrite instead uses static translation to insert such checks, to avoid some of the extra runtime overhead from not only instrumenting the code but inserting the instrumentation in the first place.
However, there is a challenge here. Because in most architectures, pointers and integers are indistinguishable in binary code, static rewriting can alter correctness. This is particularly the case for application “reflowing”, where code and data pointers must be retargeted for the newly instrumented program, which may change significantly in size and shape, without changing any other arbitrary integers and thus affecting correctness. Still, the paper argues that this is tractable for 64-bit position-independent code, which is often used for shared libraries used by many different applications, as they provide specific relocation information that can be used to identify targets within the application without heuristics.
As a proof-of-concept, RetroWrite implements a fuzzer based on AFL, designed to execute with random inputs in an attempt to generate application crashes, and a memory sanitiser inspired by AddressSanitizer, designed to detect memory accesses to forbidden locations such as over the end of arrays. These achieve similar levels of coverage to their compile-time counterparts, and are significantly faster than the dynamic QEMU or Valgrind.
But what analysis techniques do sanitizers typically implement? This systematisation-of-knowledge paper looks at the wide variety of sanitisers in the literature, which, unlike RetroWrite’s Binary Modification, are often built into the compiler. It precisely identifies the difference between exploit mitigations, which prevent the use of attacks without necessarily detecting them, and don’t tend to get adopted without lower than 5% overhead, and sanitisers, used to detect the presence of security bugs, and since they are not always used in production settings, and have value for debug alone, can exhibit much higher overheads.
The paper takes a look at the wide variety of attacks on low-level languages such as C and C++, as well as the sanitisers, both production deployable such as LLVM-CFI, and debug environment sanitisers such as AddressSanitizer. It looks at the tradeoffs in coverage, false positive rate and performance of these many different techniques, re-evaluating many of them on the authors’ own machines (often with higher overheads than reported in the original paper!), before discussing directions for future research. It considers these as the following:
1) How do we compose sanitisers such that they can be run together at the same time without interfering with each other’s analysis?
2) How can we improve the performance of sanitisers using hardware features?
3) Can we design sanitisers that can deal with low-level systems code, such as the OS kernel?
Our recent ASPLOS paper focuses on the second of these issues. Instead of running extra instructions inside a program, either inserted via binary translator or compiler, The Guardian Council offloads instrumentation onto
1) a hardware debug channel able to observe the behaviour of instructions committed by a program and
2) a programmable analysis engine able to implement sanitisation policies implemented in high-level C/C++.
The key to having a high-performance analysis engine is thread-level parallelism. We offload our security analysis compute onto many small Guardian Processing Elements, each around a thousandth the size of a regular programmable core and highly efficient. That means that we can run many different analyses at the same time in parallel, as well as paralellise analysis tasks across multiple of these Guardian Processing elements at once.
We offload a variety of different mitigators and sanitisers onto the Guardian Council, including Control-Flow Integrity techniques, an AddressSanitizer-like policy, and Rowhammer mitigation, all achieving much lower overheads than in software. A full video on the technique is available as part of the ASPLOS proceedings, which this year was held virtually:
So then, what of future work? The Guardian Council’s hardware architecture hasn’t yet been deployed on real systems. There is significant value in being able to deploy low-overhead pure-software strategies, especially to existing code as in RetroWrite. Can the overheads, coverage and analysis challenges of RetroWrite be solved by a hybrid static-dynamic scheme, such as Janus? More generally, the algorithms used for debug sanitisation are typically plagued by false positives and false negatives – reducing either would be of significant benefit to real systems. Runtime mitigators need overheads as low as possible if they are to achieve widespread deployment – and new techniques to reduce the cost of the many different bugs we need to prevent, without introducing any false positives, will benefit every application written in a low-level language.