In early December 2011, prime minister David Cameron announced a plan to give a boost to the UK’s biomedical research industry – under threat as pharmaceutical giants look abroad for cheaper R&D resources – by granting it access to patient data from the NHS.
"We’re going to consult on actually changing the NHS constitution so that the default setting is for patient’s data to be used for research, unless of course, they want to opt out," Cameron said.
To pre-empt the privacy backlash, Cameron reassured voters that their data would be anonymised. "This does not threaten privacy, it does not mean that anyone can look at your health records, but it does mean using anonymous data to make new medical breakthroughs."
But critics have questioned whether data that is detailed enough to be useful can ever be truly anonymised. De-anonymisation techniques have been developed that can piece together an individual’s identity by correlating seemingly innocuous details.
Earlier this year, in a report for the Cabinet Office, University of Southampton computer science researcher Kieron O’Hara explained quite how difficult it is to completely obfuscate an individual’s identity from their data. O’Hara explained how almost any demographic data can function as quasi-identifiers – even if they do not identify an individual on their own, they can be combined with other datasets in a technique known as jigsaw identification.
In 1997, for example, a US researcher named Latanya Sweeney showed that data points from the census – such as age, gender and zip codes – could be combined to identify unnamed members of the public.
Even an individual’s writing style can be used identify them. O’Hara’s paper pointed to work of two computer scientists, Narayanan and Shmatikov from the University of Texas, that found that individuals could be pinpointed from anonymous film reviews on rental site Netflix.
David Cameron has correctly recognised that sharing information can spur innovation and create economic opportunity. However, the fact of the matter is that it will be very difficult, if not impossible, to protect citizens’ privacy when detailed datasets are shared with the private sector.
Ross Anderson, professor of security engineering at Cambridge Computer Laboratory, says that anonymisation techniques will not work as well as the government hopes
The effectiveness of anonymisation is something piously hoped for all over Whitehall, but I’m afraid there’s bad news. There’s no conceivable way that the kind of things that [the government] wants to do with medical records can be done by just using anonymity as a shield. A small amount of contextual information can ruin anonymisation.
The problems are much worse than some people in Whitehall are prepared to contemplate – we’ve been arguing about this for fifteen years, but it’s extraordinarily difficult to get someone to understand something, when his continued employment depends on his not understanding it.
Jim Killock, executive director of the Open Rights Group, says that sharing personal data about citizens with private industry goes beyond the aims of the open data movement
When we talk about open data, we think about government data sets which do not contain personal information. We had assumed anonymised datasets were a no-go area, and we weren’t expecting the government to be including those sorts of datasets in the open data agenda.
What we’re seeing now is something very, very different. The government is talking about personal data sets, made anonymous or pseudonymous, that are licensed for commercial or research purposes and that are supplied privately, may not be distributed, and may include conditions to hold it and process it securely. This is not open data, but it’s likely to damage the open data cause.