There has been a lot of ‘fog of war’ regarding the alleged implantation of Trojan hardware into Supermicro servers at manufacturing time. Other analyses have cast doubt on the story. But do all the pieces pass the sniff test?
In brief, the allegation is that an implant was added at manufacturing time, attached to the Baseboard Management Controller (BMC). When a desktop computer has a problem, common approaches are to reboot it or to reinstall the operating system. However in a datacenter it isn’t possible to physically walk up to the machine to do these things, so the BMC allows administrators to do them over the network.
Crucially, because the BMC has the ability to install the operating system, it can disrupt the process that boots the operating system – and fetch potentially malicious implant code, maybe even over the Internet.
The Bloomberg Businessweek reports are low on technical details, but they do show two interesting things. The first is a picture of the alleged implant. This shows a 6-pin silicon chip inside a roughly 1mm x 2mm ceramic package – as often used for capacitors and other so-called ‘passive’ components, which are typically overlooked.
The other is an animation highlighting this implant chip on a motherboard. Extracting the images from this animation shows the base image is of a Supermicro B1DRi board. As others have noted, this is mounted in a spare footprint between the BMC chip and a Serial-Peripheral Interface (SPI) flash chip that likely contains the BMC’s firmware. Perhaps the animation is an artist’s concept only, but this is just the right place to compromise the BMC.
SPI is a popular format for firmware flash memories – it’s a relatively simple, relatively slow interface, using only four signal wires. Quad SPI (QSPI), a faster version, uses six wires for faster transmission. The Supermicro board here appears to have a QSPI chip, but also a space for an SPI chip as a manufacturing-time option. The alleged implant is mounted in part of the space where the SPI chip would go. Limited interception or modification of SPI communication is something that a medium complexity digital chip (a basic custom chip, or an off-the-shelf programmable CPLD) could do – but not to a great extent. Six pins is enough to intercept the four SPI wires, plus two power. The packaging of this implant would, however, be completely custom.
What can an implant attached to the SPI wires do? The BMC itself is a computer, running an operating system which is stored in the SPI flash chip. The manual for a MBI-6128R-T2 server containing the B1DRi shows it has an AST2400 BMC chip.
The AST2400 uses a relatively old technology – a single-core 400MHz ARM9 CPU, broadly equivalent to a cellphone from the mid 2000s. Its firmware can come via SPI.
I downloaded the B1DRi BMC firmware from the Supermicro website and did some preliminary disassembly. The AST2400 in this firmware appears to run Linux, which is plausible given it supports complicated peripherals such as PCI Express graphics and USB. (It is not news to many of us working in this field that every system already has a Linux operating system running on an ARM CPU, before power is even applied to the main Intel CPUs — but many others may find this surprising).
It is possible that the implant simply replaces the entire BMC firmware, but there is another way.
In order to start its own Linux, AST2400 boots using the U-Boot bootloader. I noticed one of the options is for the AST2400 to pick up its Linux OS over the network (via TFTP or NFS). If (and it’s a substantial if) this is enabled in the AST2400 bootloader, it would not take a huge amount of modification to the SPI contents to divert the boot path so that the BMC fetched its firmware over the network (and potentially the Internet, subject to outbound firewalls).
Once the BMC operating system is compromised, it can then tamper with the main operating system. An obvious path would be to insert malicious code at boot time, via PCI Option ROMs. However, after such vulnerabilities came to light, defenses have been increased in this area.
But there’s another trick a bad BMC can do — it can simply read and write main memory once the machine is booted. The BMC is well-placed to do this, sitting on the PCI Express interconnect since it implements a basic graphics card. This means it potentially has access to large parts of system memory, and so all the data that might be stored on the server. Since the BMC also has access to the network, it’s feasible to exfiltrate that data over the Internet.
So this raises a critical question: how well is the BMC firmware defended? The BMC firmware download contains raw ARM code, and is exactly 32MiB in size. 32MiB is a common size of an SPI flash chip, and suggests this firmware image is written directly to the SPI flash at manufacture without further processing. Additionally, there’s the OpenBMC open source project which supports the AST2400. From what I can find, installing OpenBMC on the AST2400 does not require any code signing or validation process, and so modifying the firmware (for good or ill) looks quite feasible.
Where does this leave us? There are few facts, and much supposition. However, the following scenario does seem to make sense. Let’s assume an implant was added to the motherboard at manufacture time. This needed modification of both the board design, and the robotic component installation process. It intercepts the SPI lines between the flash and the BMC controller. Unless the implant was designed with a very high technology, it may be enough to simply divert the boot process to fetch firmware over the network (either the Internet or a compromised server in the organisation), and all the complex attacks build from there — possibly using PCI Express and/or the BMC for exfiltration.
If the implant is less sophisticated than others have assumed, it may be feasible to block it by firewalling traffic from the BMC — but I can’t see many current owners of such a board wanting to take that risk.
So, finally, what do we learn? In essence, this story seems to pass the sniff test. But it is likely news to many people that their systems are a lot more complex than they thought, and in that complexity can lurk surprising vulnerabilities.
Dr A. Theodore Markettos is a Senior Research Associate in hardware and platform security at the University of Cambridge, Department of Computer Science and Technology.
15 thoughts on “Making sense of the Supermicro motherboard attack”
If the BMC was indeed compromised during manufacture time, why wouldn’t they simply flash a malicious firmware? Seems a lot easier than investing all the time necessary to fabricate a custom six pin passive ceramic lookalike CPLD to twiddle with SPI lines.
I think Charles Elegans is right. Most people never update the firmware on their BMC, whether iLo or IPMI or DRAC, and I’m sure it would be possible to have a nefarious version of the firmware lurking there indefinitely. An attacker could probably ensure any updates download new firmware from an alternative server with compromised code.
Very simple. what if the machine you eventually want to compromise is a ship fab machine? the firmware you might want to compromise is a new design.
sooner or later those fist gen modded board are put in to next gen ship fabrication machines….
Eg. machines hacking and modding future machines. one robot hardware hacking a future gen robot.
They could flash malicious firmware in the factory, but typically purchasers update the firmware regularly. Indeed Bloomberg mentioned there was a separate incident where the network card firmware update mechanism had been compromised. In this case, a malicious factory firmware would only survive on un-updated servers. An implant would persist over firmware update, subject to whatever software it modifies not changing too drastically.
Certainly the attack vector can be plausible in and of itself, but the story contains more than this. The story also rely exclusively on anonomous sources, and what’s more, is behing vehemetly denied by the involved SMC customers such as Apple: https://www.apple.com/newsroom/2018/10/what-businessweek-got-wrong-about-apple/
The story is all of this, not just the supposed attack (vector). And the story vs. Apples refutation of it comes nowhere near passing the sniff test, in my opinion. It requires a pretty sizeable conspiracy to work, and that’s a very bad sign for the veracity of this story in its entirety.
Don’t companies like Amazon cycle through servers all the time? Wouldn’t there be 1000s of these boards on eBay at any given moment? If this sorry it’s real I’d expect to see physical evidence soon. I’m surprised Bloomberg didn’t show any. I wonder if they tried.
Step aside. I’m a HPC cluster administrator. 🙂
I’ll try to answer some points as concise as I can.
First of all, no, these servers are not continuously cycled in the data centers. Even in HPC, you can use your resources for ~5 years. You just move your slowest tier down, buy a new generation of servers and make it top tier, and move all other hardware down a tier. So, in normal cases they should leave Amazon’s warehouses in three to five years in normal circumstances.
While I’m not certain, Amazon’s data centers are extremely dense. They don’t use 1U servers. At least they are using blades, or custom built open computing hardware, which are extremely barebones.
In the BMC firmware front, it depends. We generally update our BMC firmware to latest release alongside system BIOS before getting our servers online. So the systems are completely flashed before we start to use them. Also, OS is installed by us directly over ethernet. No BMC is involved here.
Most importantly, in a sane setup, your BMC connection cannot access internet. You should build an isolated intranet for it (including VLAN or hardware isolation, not just subnet/IP), and put a VPN in the front gate. As a result, you login to your data center, or go to there if you like metaphors. If nobody’s there via VPN, BMC network is a silent and dark place. No connection to outside, no unknown traffic, just silence. Only exception may be the discovery packets of some BMCs, which can find similar servers and form federations for easier management. Even this needs some setup beforehand.
If you have any more questions I’d try to help.
I agree with this analysis – this is the piece that isn’t a perfect fit. Common practice is to firewall inbound traffic to the BMC VLAN, but it may be that some sites don’t firewall outbound traffic: embedded devices fetching their own firmware updates from the internet is a common design pattern (though not as much for BMCs). Additionally, protocols such as NFS are fragile over the Internet.
One option the BMC has is it can simply send using the main network controller, which it can typically access via PCIe. That won’t be limited to the BMC VLAN. Additionally (and I haven’t checked for this board), many boards share the same ethernet port between BMC and main network controller – so the BMC can simply emit packets with the ‘wrong’ VLAN. Both of those would likely need more code modification than a simple boot-time divert. It would need more delving into the firmware to find out what would be feasible.
In a BMC management scenario, or remote deployment scenario the OS is not installed over internet. You deploy the management & deployment services in premises. A server and some storage is generally enough. So, you don’t run NFS over internet.
If you isolate the BMC network properly, there’s nowhere called “outside”. Internet just doesn’t route / reach there. So your BMCs cannot see the internet with or without you. I don’t think Amazon / Google / Apple will do something different. Because isolation is very cheap even from administration standpoint.
The shared BMC option is something short of a marvel, because there’s a multiplexer in place. I don’t think you can emit via wrong VLAN, because you don’t see the BMC’s ethernet card on the PCIe bus. It’s just isn’t there. The port acts as two cards. VLAN use is completely optional even in shared scenario because you see two MACs when both OS and BMC requests IPs via DHCP. So they use the same port, without knowing each other. At least the servers we have acts like it.
However the BMC also sits on PCIe as a VGA card, so it can access the ‘main’ NIC via that route. There was also a paper recently where the DIMM and ethernet PHY happened to be on the same I2C bus and the DIMM could thus send ethernet packets.
All of this needs more complexity that simply rewriting the address of the TFTP server – it just depends what capability you have.
Yeah, you might be right, but it also depends IMHO.
The BMCs we use get the same output from the normal VGA cards of the boards, which are generally built by Matrox. I think they got the digital output before the DAC for VGA conversion. I just checked a server and it was the case. Similarly the BMC was not present on the PCI bus as a device.
I think the capability of the exploit is highly dependent on how BMC is wired, and how deep its access is. While a BMC can run amok on the board, and can see very fine details of it, it’s somewhat air gapped from making great damage.
Honestly I didn’t read any technical specs and pinouts of common BMCs. Some of the latest ones can change BIOS setups and whatnot. So, this is a very delicate matter and needs much more detailed inspection IMHO. But it seems the attack is NOT that easy to accomplish. This is my opinion when I look with my experience.
The attacks are real or not, we will discuss it for a very long time and it’ll lead to some changes inevitably.
Thanks for this article, nice read! –at least, given the little and vague information presented by Bloomberg.
I can understand the story passes the sniff test from an electronic engineering stand point. What I find extremely hard to believe is that no one ever noticed a single sign of strange incoming/outgoing traffic at the firewall level.
And although, yeah, some companies are negligent security-wise and have an abandoned firewall at the most, it’s critical to keep in mind that the kind of hardware carrying the payload is commonly used as well in companies that are extremely rigorous when it comes to security. And no one ever noticed a single bit of unusual traffic? I don’t buy it.
Agreed, that’s a very good question.
hardware design engineer here.
two problems with the idea of spoofing the unpopulated SPI flash.
1) Just placing the “device” in the vicinity of the unpopulated flash chip wouldn’t accomplish anything. according to bloomberg’s mockup it would just be sitting on top of some soldermask. you would need to rework with tiny wires connecting the device to either the unused pads, or the traces after scraping the mask off. Either way it would be obvious.
2) SPI devices share the same data bus (MISO), and are activated by a chip select signal (CS). Since the unpopulated part is an option for different densities or package sizes, likely the CS line is shared. The real Flash, and the “device” would both be driving the data bus at the same time, resulting in bus contention. It wouldn’t work.
1) yes, you would need to modify the board layout files (Gerbers) – you couldn’t do it just at solder-time, unless you added some conspicuous wiring. The photos in the Twitter thread I linked to are just a few pixels around pins 6/7/8 of the SPI flash footprint, so it’s hard to verify at this point.
2) for that, you just need to drive harder than the genuine chip. The datasheet of a 32MiB flash chip used on other server BMCs doesn’t state a drive current, but I suspect the pins can drive a few milliamps. If you can drive it at a large current – tens of mA – you can force signals high or low, even if the other side is driving to the opposite.