The rush to 'anonymised' data

The Guardian has published an op-ed I wrote on the risks of anonymised medical records along with a news article on CPRD, a system that will make our medical records available for researchers from next month, albeit with the names and addresses removed.

The government has been pushing for this since last year, having appointed medical datamining enthusiast Tim Kelsey as its “transparency tsar”. There have been two consultations on how records should be anonymised, and how effective it could be; you can read our responses here and here (see also FIPR blog here). Anonymisation has long been known to be harder than it looks (and the Royal Society recently issued a authoritative report which said so). But getting civil servants to listen to X when the Prime Minister has declared for Not-X is harder still!

Despite promises that the anonymity mechanisms would be open for public scrutiny, CPRD refused a Freedom of Information request to disclose them, apparently fearing that disclosure would damage security. Yet research papers written using CPRD data will surely have to disclose how the data were manipulated. So the security mechanisms will become known, and yet researchers will become careless. I fear we can expect a lot more incidents like this one.

6 thoughts on “The rush to 'anonymised' data”

The only true security is NOT to allow your data to be used.

I have instructed my GP not to upload my data, if this is breached I will sue them for t least £1BN.

Professor Black claims in the news article accompanying my op-ed that “There has yet to be a single incident of anybody being harmed through the use of any healthcare databases.” This claim is parroted again and again by researchers who find medical privacy inconvenient, and it is simply not true. See for example the well-known case of the Cambridge medical student whose career was ruined because she had had cancer as a child and it showed up on the cancer registry.

This was an important case for medical schools, so it is astonishing that Professor Black appears to have forgotten about it completely. Perhaps if Robert Trivers ever publishes a second edition of his wonderful book on Deception and Self-Deception, he could invoke it as an example of selective recall.

You can opt out, but it is a pretty selfish move. Glad the cholera victims in London had not opted out in the 1850s, etc etc.

I read that Cambridge story and it doesn’t sound as if that person was actually harmed by the healthcare database. The article doesn’t explain why the existence of her details on the cancer registry led to her being put on special leave for 4.5 years. Perhaps it is written up better elsewhere.

Even still, the balance of proven benefits (millions of QUALYs saved) to costs (one employment case) suggest the public good is better served by making better use of databases, and we have a responsibility to future patients to consent to use our data.

On the Gordon Brown example – shouldn’t it just be made illegal to nefariously de-anonymise data in this way (as it is illegal for his doctor to sell his non-anonymised records), rather than allow the possibility of mis-use to prevent beneficial scientific research?

I’m happy for my medical records to be used for a wide range of research purposes, even in some circumstances in an identifiable form but only if:

I’m asked and my decision is respected

That all that can reasonable done it terms of privacy enhancing technologies is done and appropriate policies and procedures are in place to minimise privacy risks

The remaining privacy risks are honestly acknowledged

I’ll do all I can to deny my data to researchers don’t recognise my absolute right to make the decision or who fail to recognise the risks and what they can do to mitigate them.

Hi Ross,

The basic truth is that researchers & data managers do not know how to manage and share clinical data in ways that are both secure and accessible. There is a tension between clinical researchers who are not particularly computer savvy and see all of the security as getting in the way of their research, and data managers who are charged with ensuring that access to and use of data is only within the bounds set out in the associated research proposal. You can add to this the politicians who like to have a say about everything, the indivuals whose data make up the records, and the journalists who smell potential for scandal. There is also a tension from the owners of the datasets who are pulled in two directions, firstly by the need to secure their data to avoid possible criminal sanctions, and secondly by the desire to commerically exploit their data.

That said, many of the data-managers that I have worked with are very concerned with providing appropriate levels of access to clinical data so that research can be performed efficiently and safely. These managers are very concerned that (pseudo)-anonymised data cannot be easily de-anonymised; notwithstanding the difficulties of doing real anonymisation.

They currently tackle this using a number of approaches but mainly by restricting both volume and type of data so as to minimise opportunities for trivial deanonymisation. This restriction on volume and type of data also extends to the amount of data that is returned to a researcher when they perform queries upon a dataset. For example, if a query will return results for less than, e.g. ten individuals, then those results will have to be ok’d by a data manager. Secure research environments are being built so that data is kept within controlled environments. There range from Citrix based systems through to what I call the “full Tom Cruise”, Mission Impossible style, isolated terminals in monitored research environments in which entry, exit and use are all monitored via cameras and keyboard logging. Logging and auditing are increasingly being used to ensure that data is only used in the ways that were defined in the research proposal and sanctions are available that can remove access to all clinical data sets for entire research establishments and not just individual researchers. Unfortunately none of these approaches will stop a determined attacker but reduce the opportunity for either the lost laptop scenario or the idyl misuse of research data to identify individuals (without intent).

Whilst there are a lot of “important” people with a lot of big talk about these issues and the need for laws and legal frameworks, there are also lots of people further down who are trying to come up with practical protocols, tools, and practises for ensuring efficient and safe use of clinical data.

Unfortunately this is a complex human-socio-technical system with many competing requirements and stakeholders. The stakeholders are all trying to establish an area of acceptable use within the points defined by the two continuums of promiscuity and control. About the only points of agreement so far are that the extremes of either continuum are unacceptable because the absense of clinical data research would be a tragedy for humanity whereas unrestricted access to clinical data would be a tragedy for individuals.

Full Disclosure: I worked for Professor Andrew Morris looking at the use of auditing and logging within research and data storage environments in the last year.

Here’s a report of my talk at the Amsterdam Privacy Conference

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Light Blue Touchpaper

Security Research, Computer Laboratory, University of Cambridge

6 thoughts on “The rush to 'anonymised' data”

Leave a Reply Cancel reply