The role of software engineering in electronic elections

Many designs for trustworthy electronic elections use cryptography to assure participants that the result is accurate. However, it is a system’s software engineering that ensures a result is declared at all. Both good software engineering and cryptography are thus necessary, but so far cryptography has drawn more attention. In fact, the software engineering aspects could be just as challenging, because election systems have a number of properties which make them almost a pathological case for robust design, implementation, testing and deployment.

Currently deployed systems are lacking in both software robustness and cryptographic assurance — as evidenced by the English electronic election fiasco. Here, in some cases the result was late and in others the electronic count was abandoned due to system failures resulting from poor software engineering. However, even where a result was returned, the black-box nature of auditless electronic elections brought the accuracy of the count into doubt. In the few cases where cryptography was used it was poorly explained and didn’t help verify the result either.

End-to-end cryptographically assured elections have generated considerable research interest and the resulting systems, such as Punchscan and Prêt à Voter, allow voters to verify the result while maintaining their privacy (provided they understand the maths, that is — the rest of us will have to trust the cryptographers). These systems will permit an erroneous result to be detected after the election, whether caused by maliciousness or more mundane software flaws. However should this occur, or if a result is failed to be returned at all, the election may need to fall back on paper backups or even be re-run — a highly disruptive and expensive failure.

Good software engineering is necessary but, in the case of voting systems, may be especially difficult to achieve. In fact, such systems have more similarities to the software behind rocket launches than more conventional business productivity software. We should thus expect the consequential high costs and, despite all this extra effort, that the occasional catastrophe will be inevitable. The remainder of this post will discuss why I think this is the case, and how manually-counted paper ballots circumvent many of these difficulties.

I think the most significant challenges in electronic elections come from the nature of deployment. The election date is immovable and ready or not, the software must be deployed then. The 1995 Sandish Report found that only 16.2% of IT projects were delivered on-time and on-budget, which is representative of the situation both before and since. Re-use of election software can help, but different regions and countries have different requirements and they change over time. The US has write-in votes, ballot papers in the UK must be retained after the election, linked to the voters name, and Scotland introduced STV this year. These customizations need to be implemented and tested in time for the election.

Another factor is that in the long gap between elections, staff with experience of previous elections will move on and know-how will be lost. The resulting unfamiliarity increases the risk of mistakes, and nobody might remember the previous problems and how they could be worked-around or prevented. In the Bedford e-counting trial, a significant source of problems was in the production of ballot papers (wrong size, wrong ink and tended to tear). No doubt, someone at the contractor was given into trouble for that, but when the next election comes in three years, will there be anyone who remembers?

Furthermore, hardware, operating systems and middleware will evolve between elections so the vote counting software will need to be adapted. The cost of this should not be underestimated — one survey reported that adaptation to new platforms accounted for 18% of software maintenance. All these changes, as well as ones due to changing requirements, must be tested, but the cost of performing a full system test, with a realistic number of votes, voters and staff, would be prohibitive. Instead, only unit tests and small integration tests are feasible, which risk missing feature interactions, race conditions and scaling problems. The last two appeared to be behind the delayed Bedford elections.

Another case where full testing is costly, deployments infrequent and failure expensive is rocket-launch control software. These are developed using expensive, high-integrity software development methods. This involves robust programming techniques, extensive testing and use of reliable hardware components (which also typically come with extended manufacturer support, to reduce the maintenance costs discussed above). Despite these measures, failures do occur. One well known example is the Ariane 5 Flight 501. The details of the failure are not relevant here, but testing did not catch the problem, and the reasons behind this also apply to voting systems.

Every components in Ariane was tested individually, but the failure occurred because of a interaction between two components and high g-forces which could not be repeated outside of a test-flight. Even simulating the input from the accelerometer would be costly, so the decision was made to rely on the test results from Ariane 4, which had a lower acceleration. When exposed to the Ariane 5 flight profile a software component failed, which was non-critical in itself, but the knock-on effect caused the destruction of the rocket. This closely matched the Bedford experience, where the voting system passed a small scale test but, when faced with a high number of manually adjudicated votes (due partially to paper problems), first slowed down then exhibited failures.

Where operators are under stress from dangerous events occurring, they make errors in judgement around 20–30% of the time. Elections are also stressful, and this increases the probability of mistakes. Moreover, in the case of e-counting, the processing is often in the night following the election, and run without breaks hence causing operator fatigue, further increasing the error rate. Operators are also inexperienced, because elections are infrequent. If exceptional events occur they have no experience to draw from, and so are more likely to make the wrong decision. Usability is thus even more critical, yet electronic elections are more complex — in Punchscan the poll staff must follow around 16 steps per ballot, rather than the 3 or 4 for UK paper elections. In one demonstration I saw, even a designer of the system performed two critical actions in the wrong order.

Finally, all of this assumes only accidents, but elections are subject to attack. Murphy has proved more than capable of disrupting the English election trials, but what happens if someone is malicious. The cryptography will prevent them from altering the result undetected, but if they can hack into the computers, disrupt the communications or destroy critical infrastructure, the entire election could be halted. Backups can help, but as any experienced sysadmin can tell you, good backups are expensive and even then failures do occur.

These factors will result in electronic voting systems being unreliable, and the cryptographic solutions will only make it seem worse because wrong results will be detected. For example, the Breckland e-counting system seemed to be working until a manual re-count discovered the computer had lost 368 ballots. Expensive high-integrity software development practices will reduce, but not eliminate these problems. One alternative is to remain with paper ballots and manual counting, but these come with problems too. However, I argue that they have advantages when considered from a “software” engineering perspective.

It’s hard to perform non-reversible actions with paper by accident, and it gets harder with scale, whereas accidentally deleting or corrupting all files on a network filesystem, rather than one, could be the matter of an extra space character, whether the result of a slip when entering commands manually, or hidden in the depths of an unexercised, unexamined code path. Accidentally damaging a room full of paper is harder than the equivalent number of electronic records.

Paper wins when it comes to the principle of least astonishment — your average poll worker understands how paper behaves, but even experts are regularly caught out by unexpected computer behaviour. This factor, coupled with the fact that humans are adaptable, makes it far easier to change procedures in an manual count, rather than in electronic ones. In response to unexpected circumstances, for example voters filling in the ballot incorrectly, an announcement can be made on how to treat this case. In contrast, making an equivalent change to the software, without the opportunity for even cursory testing, risks introducing new bugs and could harm the integrity of the election.

In summary, cryptographically verifiable electronic elections have advantages — they have the potential to run more complex voting systems, such as Condorcet, speed up counting and give voters better assurances that their vote has been counted. However, the involvement of computers introduces complexity and the consequent higher risk of failure. Spending more on development can mitigate this problem, but paper votes and manual counting side-steps many of the risk factors, is transparent and robust, so is an option that should not be discarded solely in the interest of apparent modernization.