Encoding integers in the EMV protocol

On the 1st of January 2010, many German bank customers found that their banking smart cards had stopped working. Details of why are still unclear, but indications are that the cards believed that the date was 2016, rather than 2010, and so refused to process a transaction supposedly after their expiry dates. This problem could turn out to be quite expensive for the cards’ manufacturer, Gemalto: their shares dropped almost 4%, and they have booked a €10 m charge to handle the consequences.

These cards implement the EMV protocol (the same one used for Chip and PIN in the UK). Here, the card is sent the current date in 3-byte YYMMDD binary-coded decimal (BCD) format, i.e. “100101” on 1 January 2010. If however this was interpreted as hexadecimal, then the card will think the year is 2016 (in hexadecimal, 1 January 2010 should have actually been “0a0101”). Since the numbers 0–9 are the same in both BCD and hexadecimal, we can see why this problem only occurred in 2010*.

In one sense, this looks like a foolish error, and should have been caught in testing. However, before criticizing too harshly, one should remember that EMV is almost impossible to implement perfectly. I have written a fairly complete implementation of the protocol and frequently find edge cases which are insufficiently documented, making dealing with them error-prone. Not only is the specification vague, but it is also long — the first public version in 1996 was 201 pages, and it grew to 765 pages by 2008. Moreover, much of the complexity is unnecessary. In this article I will give just one example of this — the fact that there are nine different ways to encode integers.

Compliant implementations must be able to implement all of these encoding forms; the one used in any one place depends on context.

TLV field lengths (< 128)
When data is sent from the card to the terminal, they are encoded in tag-length-value format (TLV), inherited from ASN.1. The tag specifies the type, and the length specifies how many bytes follow in the value. When the length is less than 128 bytes, the single length-byte has the most-significant bit cleared, and bits 7–1 encode the length.
TLV field lengths (≥ 128)
When a TLV field value is 128 bytes or longer, the length is encoded using multiple bytes. The first byte has the most-significant bit set, and bits 7–1 encode the number of length bytes that follow. These are then interpreted as a big-endian integer.
TLV tags (< 31)
EMV tags are also variable length. For tags which are less than 31, a single byte is used with bits 5–1 encoding the tag number (bits 8–7 specifies the namespace of the tag, and bit 6 specifies whether the value is encoded in TLV too).
TLV tags (≥ 31)
When the tag is 31 or greater, bits 5–1 of the first byte are set to “11111” and the actual tag number is encoded in bits 7–1 of subsequent bytes. The last of these bytes has the most-significant bit cleared, whereas preceding ones have it set.
Compact TLV
Alternatively, data can be sent from the card to the terminal in compact TLV format. Here, both tag and length are combined into a single byte — bits 8–4 (after adding 0x40) becomes the tag, and bits 3–1 specifies the length. This is used for encoding data in the historical bytes of the answer-to-reset.
Numeric
Many data items are encoded as binary-coded decimal, with two digits per byte and left-padded with zeros (e.g. used for dates, amounts of currency, some protocol flags).
Compressed numeric
Some data items are encoded in “compressed” binary-coded decimal, with two digits per byte and right-padded with ‘F’s (e.g. used for account number, copy of track two magstrip data).
Binary
Other data items are big-endian integer encoded (e.g. used for encoding the transaction counter, public key parameters, and some protocol flags). Fields are fixed-length, but not necessarily on byte-boundaries.
ASCII
Some integers are encoded as their corresponding ASCII representation, with one digit per byte (e.g. used for encoding the copy of track one magstrip data) .

Given this wide variety of subtly different encodings (and I have probably missed some), it is not that surprising that a smart card developer could slip up, as has appeared to have occurred with the German EMV cards. Handling variable-length integers is a frequent source of bugs (ASN.1 is a particularly bad offender in this regard). This also makes me wonder whether there are security vulnerabilities in the cards or terminals.

Update (2012-07-30): Indeed, it looks like there are vulnerabilities in terminals. Nils and Rafael Dominguez Vega from MWR Labs successfully took over Chip & PIN terminals using a specially designed smart card, and made them play a racing game (as well as capture card details and PINs). There aren’t any details yet on what vulnerability they exploited.

* Actually, while this scenario explains the wide-scale problems occurring in 2010, if the month and day were interpreted as hexadecimal too, then cards which expired in September–December, would have problems in their last year of validity. The wrap-around for the day will cause a problem for cards expiring in September, during 20–30 September, because the BCD for 20 September 2009 is “090920”, which is greater than 31 September 2009 in hexadecimal — “09091f”. Similarly, the month will become a problem for cards expiring in October–December, during 1 October — 31 December, because the BCD for 1 October 2009 is 091001, which is greater than the 31 October/November/December 2009 in hexadecimal — “090a1f”/”090b1f”/”090c1f”). So either these glitches occurred but were attributed to random failure, or there is something special about the handling of the year in the failing Gemalto implementation (perhaps related to a conversion between two and four digit years).

9 thoughts on “Encoding integers in the EMV protocol

  1. Interesting article, but your mention of ASN.1 as a “particularly bad offender” is totally inappropriate. None of the vulnerability reports related to ASN.1 encoding/decoding implementations that have appeared over the years have uncovered any defects in the ASN.1 standards themselves. In each case, the source of reported vulnerability was a defect in this or that implementation. Perhaps the most frequent defect found in those faulty implementations is buffer overflow. You will agree that this is a very basic programming mistake, and not something attributable to the ASN.1 standards.

  2. The first several formats you list are all straight out of ASN.1 BER encoding, and as such ought to be part of the repertoire of anyone coding for this kind of thing. None of them is in the least difficult, either to encode or to decode.
    That still doesn’t excuse interpreting “101010” as a date in 2016, of course. But what cretin specified 2-digit years in the first place?

  3. It’s interesting to know what the developer did wrong, but the most important question is why the entire German EFT industry didn’t detect this problem before 2010. Typically, any project involving the deployment or development of hardware or software for ATMs, POS systems or cards goes through pretty rigorous formal testing and certification. Did EVERY SINGLE German test plan omit a simple “future date” test case? Did someone pick it up and not realise the impact? Did someone pick it up and suppress it??

  4. @Alessandro

    One of the most important characteristics of a good specification is that it should be implementable. The fact that otherwise competent programmers seem to be consistently unable to create secure ASN.1 implementations does point to a problem with the specification, in my opinion. Blaming programmers doesn’t make the problems go away.

    This lesson has already been learned in the field of safety critical systems. Previously, when an accident happened, a person would be identified as responsible for the “human error”, and fired or perhaps re-trained. The problem with this “blame and train” attitude is that people kept on dying.

    The more modern approach to accident analysis is to accept that to err is human, and design systems around this fact. By looking at the system as a whole, we can help people make fewer mistakes, and design processes to reduce the damage when they do. This is the motivation behind root cause analysis: finding all the ways in which an accident could have been prevented, not just finding a person to blame.

    With respect to software, I think that lessons can be learned from this field. If the correctness of a system depends on a super-human ability to not make mistakes, then the design of the system is broken. Correcting this requires changes of many different types, including use of static analysis, safer programming techniques, as well as procedural improvements such as in testing. However, I think an important addition would be to design specifications which do not lend themselves to implementation blunders.

  5. @Fred

    Indeed, the first few are from ASN.1 BER. However I would argue that implementing the decoder/encoder is not so trivial. Firstly, the EMV specifications are incomplete (e.g. they do not specify the endianness of multi-byte TLV field lengths). Secondly, unless great care is taken, an implementation will be vulnerable to buffer and/or integer overflows (as the various vulnerabilities in commercial ASN.1 decoders proves).

    However, my main point was not so much about ASN.1, but about the huge variety of different encodings. Why is track one encoded differently from track two? Why is it possible to encode the amount of a transaction in either BCD or an unsigned binary integer? I think this complexity could be part of the reason we see implementation problems in EMV.

    As for the two-digit years, that also surprised me given that the first version of EMV came out in 1996 when the Y2K bug was fairly well known. I’m not sure if there was any practical impact though, because as far as I know the first large-scale deployments of EMV were post-2000.

  6. @Ian

    I agree. Having a programmer make a mistake like this is probably unavoidable, so processes which stop the bug hitting production code are essential. I think software engineering quality would be significantly improved if companies felt able to talk about such failures openly, so that others could learn from their mistakes.

  7. “If the Lord Almighty had consulted me before embarking on creation thus, I should have recommended something simpler.”
    (Alfonso X The Wise).

    ASN.1, Track 2 and Track 1 are legacy. EMV just adopyted them as they are.

    Compressed numeric items are not really numbers, such as track 2 that contains ‘D’.

    So we are left with two numeric formats, BCD and binary. While two is twice as much as the theoretical minimum, it is far cry from the nine presented above.

  8. At the risk of making a “throway comment”, which some finextra commenter already chided Steven for, this is my opinion:

    The real problem is not over-complexity in the means of EMV, but over-complexity in the goals. It simply tries to support far too many options and variations. To give an example: a lot of the on-card risk management is just a waste of time imho. A large proportion of our customers simply dont bother with that stuff, they look to use the default values, and I think they’d prefer if it wasnt there at all. And mastercard and visa busy themselves making little changes and deltas to make their spec own version, because they are too proud to agree. Hence we have a raft of super-specifications on top of EMV, and EMV has been explicitly designed to support this diversity. Diversity is bad (well, that throw away comment is a whole argument for another day). So my main gripe is with the aims, not with the means.

    Now, it may be that the way EMV has turned out is the best one could possibly hope for a protocol designed and agreed between multiple interested parties with no clear dictator to lay down the way things are. But whilst that can explain problems, it cant annul them.

    We are left with a situation where EMV sucks, I’m afraid. It’s true… BUT… it sucks in the same way that TCP/IP sucks. And just like TCP/IP, EMV is very successful, well at least in that it has achieved wide deployment and we rely upon it daily. Perfect, no? Good, debatable? Successful — yes!

    End “throwaway comment”.

  9. Author should consider withdrawal of “ASN.1 is a particularly bad offender in this regard” comment.

    ASN.1 is just schema description language.

    There are multiple mature BER/DER/PER encoders/decoders available for different languages. The real issue with the industry is NIH (Not Invented Here Syndrome).

Leave a Reply

Your email address will not be published. Required fields are marked *