Protecting software distribution with a cryptographic build process

At the rump session of PET 2006 I presented a simple idea on how to defend against a targeted attacks on software distribution. There were some misunderstandings after my 5 minute description, so I thought it would help to put the idea down in writing and I also hope to attract more discussion and a larger audience.

Consider a security-critical open source application; here I will use the example of Tor. The source-code is signed with the developer’s private key and users have the ability to verify the signature and build the application with a trustworthy compiler. I will also assume that if a backdoor is introduced in a deployed version, someone will notice, following from Linus’s law — “given enough eyeballs, all bugs are shallow”. These assumptions are debatable, for example the threats of compiler backdoors have been known for some time and subtle security vulnerabilities are hard to find. However a backdoor in the Linux kernel was discovered, and the anonymous reporter of a flaw in Tor’s Diffie-Hellman implementation probably found it through examining the source, so I think my assumptions are at least partially valid.

The developer’s signature protects against an attacker mounting a man-in-the-middle attack and modifying a particular user’s download. If the developer’s key (or the developer) is compromised then a backdoor could be inserted, but from the above assumptions, if this version is widely distributed, someone will discover the flaw and raise the alarm. However, there is no mechanism that protects against an attacker with access to the developer’s key singling out a user and adding a backdoor to only the version they download. Even if that user is diligent, the signature will check out fine. As the backdoor is only present in one version, the “many eyeballs” do not help. To defend against this attack, a user needs to find out if the version they download is the same as what other people receive and have the opportunity to verify.

My proposal is that the application build process should first calculate the hash of the source code, embed it in the binary and make it remotely accessible. Tor already has a mechanism for the last step, because each server publishes a directory descriptor which could include this hash. Multiple directory servers collect these and allow them to be downloaded by a web browser. Then when a user downloads the Tor source code, he can use the operating system provided hashing utility to check that the package he has matches a commonly deployed one.

If a particular version claims to have been deployed for some time, but no server displays a matching hash, then the user knows that there is a problem. The verification must be performed manually for now, but an operating system provider could produce a trusted tool for automating this. Note that server operators need to perform no extra work (the build process is automated) and only users who believe they may be targeted need perform the extra verification steps.

This might seem similar to the remote-attestation feature of Trusted Computing. Here, computers are fitted with special hardware, a Trusted Platform Module (TPM), which can produce a hard to forge proof of the software currently running. Because it is implemented in tamper-resistant hardware, even the owner of the computer cannot produce a fake statement, without breaking the TPM’s defences. This feature is needed for applications including DRM, but comes with added risks.

The use of TPMs to protect anonymity networks has been suggested, but the important difference between my proposal and TPM remote-attestation is that I assume most servers are honest, so will not lie about the software they are running. They have no incentive to do so, unless they want to harm the anonymity of the users, and if enough servers are malicious then there is no need to modify the client users are running, as the network is broken already. So there is no need for special hardware to implement my proposal, although if it is present, it could be used.

I hope this makes my scheme clearer and I am happy to receive comments and suggestions. I am particularly interested in whether there are any flaws in the design, whether the threat model is reasonable and if the effort in deployment and use is worth the increased resistance to attack.

13 thoughts on “Protecting software distribution with a cryptographic build process

  1. The ability to automatically and remotely version a system is of great value to an attacker or worm. As the system grows, we should expect less dedicated admins. Having them be reliably identified may be a substantial downside.

  2. I am no security expert but I am a little puzzled by some aspects of this. As I see it the source is hashed by the developed (and probably signed, although it assumed that this signature may be compromised)

    The problem, as I see it, seems to embeding the source hash into the binary. It require you to trust the compiler (both as a program and a person) to actually embed the correct hash. The only way to verify if it is the correct hash is for the user compile the code yourself – at which point it would seem simpler just to check the hash yourself.
    It would be inevitable that people compiling in malicious code would forge the hash.

    Most software – even open source – is distributed as binaries.

    In short the concept of publically available hashes appears to work for source code managment but I see problems in applying this to binaries.

    But I could very well be missing something!

  3. Steven,

    Couple of points that popped up on reading the scheme:

    adding a backdoor to only the version they download

    How do you think this is going to play out? Will the attacker perform an active MITM attack or do we assume that he has complete control over the download server of the software (and perform targeted poisoning based on something like IP address)? This is important as in the latter case, the scheme you propose can be exploited to perform an effective DoS if all software installations report different hashes.

    Does your threat model assume that except for the downloaded software, all the other system parts (compiler, OS etc.) of the user are trusted?

    You are also implicitly assuming a collision-free hash function. Most reports of collision in existing hash functions are not taken seriously because they cannot (yet) work on predefined preimage. However in this case, one doesn’t need a predefined preimage. The attacker could in theory add arbitrary files into the package to obtain the same hash.

    For the verification would you assume a fixed set of trusted directory servers? So the threat model by extension would assume that these servers are not compromised. If the system doesn’t use a fixed set of trusted servers, it needs to provide a secure way to authenticate the directory servers so that the attacker cannot perform a MITM during the verification process.

  4. @Adam S

    You are quite correct that this scheme could be useful to attackers, so there is a trade-off to be made here. I am not particularly worried about this side-effect at the moment for a few reasons.

    — Tor already publishes the version number, so this scheme is no worse than the current situation. Server admins concerned with the information leak could just as easily fake the hash and version. However if the scheme is forced, using a TPM, then there would be a problem.

    — Fingerprinting networked applications is generally pretty easy. With access to the CVS repository, I think a good distinguisher for security relevant changes in Tor could be found.

    — For worms there is generally little need to fingerprint, if it knows about multiple attacks, it just runs them all. Where it does make a difference is if attacks are time-consuming or risk crashing the application. This might be the case when protection mechanisms prevent a NOP “landing-pad” being created and force accurate address guessing (e.g. return to libc).

    Attacks against an individual are more of a problem because here the attacker can fingerprint the application, then search for the right exploits. Also, it should be noted that the hash is over the source-code, not the binary or memory image, so compiler and OS added security features (e.g. stack canaries and address space randomisation) are not revealed. TPM based solutions might be more of a problem in this respect.

  5. I agree with points 1 and 2, but not 3. (“Worms can run multiple attacks.”)

    Sometimes, that’s true. Other times, attack variants will crash the target, making it more important to get the right attack the first time. If there’s an attack ordering that gets around the DoS problem. then the worm still needs to fire off n times as many attack packets, which slows down propagation by a factor of N. From the perspective of a worm that wants to be fast spreading, that’s bad. From the defender’s perspective, its great.

  6. @Gavin Jamie

    As I see it the source is hashed by the developed (and probably signed, although it assumed that this signature may be compromised)

    The source code is signed, but the signature key might be compromised in my threat model. I assume that a significant number of server operators will compile from source, and the build process hashes the source for them. I also assume that some operators will check for backdoors and verify that the application publishes the true hash. These assumptions may be erroneous, but there doesn’t have to be a 100% guarantee of foul-play being detected. If the attacker has to pay a high cost for being caught, then only a small probability is needed for an effective defence.

    Most software – even open source – is distributed as binaries.

    Yes, this is one of the problems with the scheme. It only works when some server operators compile from source. The protection my proposal gives is through safety in numbers, but those who run pre-compiled binaries do not help, because they have no opportunity to check the source code for vulnerabilities. So I think that pre-compiled binaries should not publish their hash, and users of such packages should be concerned if they do.

    In short the concept of publically available hashes appears to work for source code managment but I see problems in applying this to binaries.

    I agree. If everyone uses pre-compiled binaries, then backdoors will not be detected through source-code inspection and my proposal does not improve the situation. Hashing the binary is undesirable because it will vary depending on compiler, it leaks more information than needed (see my reply to Adam S) and if it was not compiled by the user, does not offer protection.

  7. @Srijith

    How do you think this is going to play out?

    Probably compromise of the distribution server, along with the signing key. If the attacker controls the downloading client then there is the additional problem of how to get hashes securely, but I discuss this more below.

    This is important as in the latter case, the scheme you propose can be exploited to perform an effective DoS if all software installations report different hashes

    Correct, but I am treating a DoS is better than compromise. In this case I would hope someone would raise an alarm that the developer key is compromised. Because OpenPGP signatures have non-reputability, anyone could publish two source packages, with the same version number and valid signatures, but with different hashes. Unless the developer has a very good excuse then the compromise would be revealed.

    Does your threat model assume that except for the downloaded software, all the other system parts (compiler, OS etc.) of the user are trusted?

    Yes, because if the OS is compromised then the attacker has much easier options. Perhaps curtained-memory could restrict the trusted computing base and allow some components of the OS to be compromised while still allowing my proposal to be implemented securely. I don’t expect this to be the case in the short term.

    You are also implicitly assuming a collision-free hash function.

    Yes, I do require strong collision resistance (incidentally, even though it used often, I don’t like the term collision-free since message digest functions have collisions, they just should resist discovery). I eagerly await a hash function which is considered to meet this requirement, but the current ones are either not analysed enough or too similar to known broken hash functions to make me confident.

    Actually running the attack could be tricky, as the widely published version of the source code would need to have a magic BLOB somewhere, which might be hard to justify. Still, I would not rely on this for security as there have been demonstrations of plausible collisions in certificates and PostScript documents.

    If I were deploying this scheme today, I think I would use two different hash functions, in the hope that making two packages which collide in both would be infeasible. Perhaps SHA-256 and RIPEMD-160 combined would be adequate.

    So the threat model by extension would assume that these servers are not compromised.

    Also true. My hope is that it is easier to have multiple directory servers than software distribution points. Also, I think that getting an authentic list of hashes is easier than anonymously getting a source code package. In principle, the hash of a daily table of version hashes could be published on paper and verified by potential users and server operators. The timestamping literature has a number of other schemes involving hash chains and trees, for example the PGP Digital Timestamping Service.

  8. Also, I think that getting an authentic list of hashes is easier than anonymously getting a source code package.

    I do need to go through the literature you pointed out but my position is that if you want the OS to do the verification automatically, the process has to be real-time and online. Using a predefined list of servers does have the problem of a DoS on the process. The crash of the one of the core Tor servers (moria?) that left all Tor server nodes hanging for couple of hours comes to mind.

  9. @Srijith

    my position is that if you want the OS to do the verification automatically, the process has to be real-time and online.

    I don’t think the verification process can be automatic, at least until operating system vendors integrate a tool which will perform it. When the user downloads the source code, the build tools in the package are untrusted, so cannot be used. All the user has is what came with the operating system, of which I am using a hashing tool.

    He can then manually compare the hash of the source package with the online list. It is tedious, but I cannot think of a work-around. However, it only needs to be performed by those who worry about being a target of the attack I describe.

    Using a predefined list of servers does have the problem of a DoS on the process.

    Yes, but a fairly minor one in my opinion. It only affects users who have just downloaded the source code. On discovering that the servers are down, potentially due to attack, they then have a choice of whether to just hope everything is OK, or to wait. The timestamping service I mentioned distributes hashes through Usenet, which provides some DoS resistance.

  10. You seem to have covered all the bases I can think of 🙂

    As far as I can see, the only possible weak link is the process by which the user is informed where to check for the correct hash. If there is a centralised repo. for all open source software (like sourceforge etc.) this is a clearcut process. However if every software developer uses his/her own server to publish the hash, there is no secure way for the user to be directed to the uncompromised directory server.

    Have you started work on a paper already? 🙂

  11. @Srijith

    Have you started work on a paper already?

    No, not yet. I am considering it, but am still unsure whether it is worthwhile. The idea is quite obvious and the threat fairly obscure so I don’t think it would make a fantastic paper. There might still be some value in writing it up properly though. I felt the blog post was on the long side, but I did have to miss out many details. A short paper would still give a lot more space for expanding on these points.

  12. Why all this hash checking?

    It sounds like its all very muddy water and fraught with dangers. and still you wont be sure.

    Why not just test the WHOLE executable code itself? Thats definitive for ALL Tors and especially the ones who download the exe nd dont build their own (thats 95% – I’ve tried following Tor build instructions for many days and failed – its obviously a black art – “build your own version” is a Torland devs euphemism for – “go to hell”).

    If each Tor also had a, rarely changing – no
    more than every 2 years – & simple – so they could all compile it, binary comparator service running on port 9002 (say). Then this service could contact each other and request a segment (size to be determined – a reasonable chunk) of a fellow Tors exe, to check against their exe code.

    If this check was done by every Tor on the network with one other Tor every 10 minutes, then gradually descrepencies could be found and offending servers isolated and informed.

    A kind of voting system could be used.

    Of course, Tors could only check like with like, if they ran the same version (& same os version).

    But that wouldnt stop the Tor devs from putting all versions up for checking from the systems servers themselves, and by removing such support for old versions they could push those Tors into upgrade.

    2 birds with one stone as it were!

    The only problem, perhaps, with this system is those persons who have “custom built” their executables.

    But they know who they are anyway, dont they, so no problem there then.

    And its only fair that other TOR servers should also know who has “custom built” executables, so they may take action to protect themselves, if they feel the need!

Leave a Reply to Gavin Jamie Cancel reply

Your email address will not be published. Required fields are marked *