AOL has recently been embarrassed after it released data on the searches performed by 658,000 subscribers. Their names had been replaced by numbers, but this was not enough to stop personal information leaking. The AOL folks just didn’t understand that protecting data using de-identification is hard.
They are not alone. An NHS document obtained under the Freedom of Information Act describes how officials are building a “Secondary Uses Service” which will contain large amounts of personal health information harvested from hospital and other records. It’s proposed that ever-larger numbers of people will have access to this information as it is progressively de-identified. It seems that officials are just beginning to realise how difficult it will be to protect patient privacy — especially as your deidentified medical record will generally have your postcode. There are only a few houses at each postcode; knowing that, plus a patient’s age, usually tells you whose record it is. The NHS proposes to set up an “Information Governance Board” to think about the problem. Meanwhile, system development steams ahead.
Clearly, the uses and limitations of anonymisation ought to be more widely understood. There’s more on the subject at the American Statistical Association website, on my web page and in chapter 8 of my book.