Privacy by design

When web giant Google launched Buzz, its now-defunct attempt at a social network, back in 2010, the company stumbled into an unexpected privacy scandal. The site automatically enrolled users of its web mail service Gmail and published a list of their most emailed contacts without asking.

Ann Cavoukian, information and privacy commissioner for the Canadian province of Ontario, visited Google’s offices right after Buzz’s privacy breaches became public, and met with one of the lead developers on the project. She asked him what he had been thinking to develop code that automatically exposed users’ most emailed contacts.

“He looked at me and, honest to God, tears came into his eyes,” Cavoukian says. “He said, ‘That just never occurred to us.’” Cavoukian says the problem with Buzz was that privacy simply had not been considered during the design process. “The developers’ instructions were to make this thing connect widely, and make it proliferate,” she explains.

“Someone with a privacy background would have known not to use data collected for one purpose for other, unrelated processes without the consent of the data subject.”

In an age when businesses collect personal data at every turn, this kind of oversight is no longer acceptable, Cavoukian argues. She advocates a strategy that she calls ‘privacy by design’, in which safeguards for the privacy of data subjects are baked into software systems, not added as an afterthought to tick a compliance box.

Achieving ‘privacy by design’ is on one hand a matter of educating software developers and instigating governance processes that ensure privacy is considered at every step of the development lifecycle. After the UK Information Commissioner’s initial investigation of the Google Street View Wi-Fi snooping scandal (it has since launched a new investigation), it recommended that Google introduce ‘privacy design documents’ for every new project, assessing the privacy implications before any code is written.

But technologies are in development that promise to instil ‘privacy by design’ by embedding the need for an individual’s consent to process information about them deep into the guts of data processing systems.

One example is OAuth, an open standard for authorisation – i.e. granting systems permission to access data – that was first developed for use with microblogging service Twitter. It allows users to grant applications access to their data without having to share usernames and passwords.

The OAuth project website compares the technology to the valet key that comes with some luxury cars, allowing a parking attendant to drive the car in a limited capacity, without accessing the onboard computer, opening the glovebox or going further than two miles.

In a similar way, OAuth generates tokens that allow web services to share specific personal data, for a specific amount of time. Web companies such as LinkedIn, Microsoft, Google and Yelp all use the protocol, but it could also be applied in conventional business-to-business data sharing contexts.

More ambitious – but still in the conceptual stage – is Smartdata, a technology being developed by researchers at the University of Toronto. The idea is that any personal data published on the Internet will be guarded by ‘virtual agents’, software components that enforce rules, set by the data subject, on how companies are permitted to use that data.

“The data will only allow itself to be used in ways that the user wanted, and it would basically disappear or self-destruct if an unauthorised third party tried to use it for some secondary use,” Cavoukian explains.

“That’s opposed to the existing model, where the organisation that collects your data also seems to own it and control it.”

Pseudonymisation in the NHS and casinos

One established technique that is used by businesses to preserve the privacy of their customers is pseudonymisation. Severing personal identifiers from their data sets means that businesses can analyse their data for trends and correlations without the risk of finding out more about the subjects than they would be comfortable with.

The NHS, for example, uses pseudonymisation to analyse patient data without identifying the individuals involved. Besides clinical analysis, this also allows the health service to use the data to investigate operational issues such as performance management, capacity planning and service redesign.

The NHS Connecting for Health website explains how exactly it uses pseudonymisation. Names, birth dates, post codes and other potentially identifying data are replaced with a randomly generated alpha-numeric string, which becomes what is known as the root pseudonym for each patient.

This root pseudonym is never revealed to NHS staff, but it is used to generate public pseudonyms for use by individual departments. This allows departments to look for trends such as, for example, the distribution of diseases according to location, without being able to see the home address of any one data subject.

Meanwhile, casinos in Canada’s Ontario province, where Cavoukian is information commissioner, have been using pseudonymisation for very different reasons.

“The Ontario Lottery and Gaming Commission operates 27 casinos in the province,” she explains. “It’s state owned, so it comes under my jurisdiction as a part of government.” The casinos run a programme that helps addicted gamblers to stay out of their casinos. Called the self-exclusion programme, it allows addicts to voluntarily register themselves with the casinos, and to be barred entry from that point on.

“The problem was that the file [containing banned gamblers] would live in a binder in a back office, and when people would try to sneak back in, the people at the entrance don’t know what’s in the binders,” Cavoukian says. “They would sneak back in, whittle away their life savings and lose their job and their family, and if that wasn’t bad enough, they would then sue the government. So the government was losing money.”

The Gaming Commission asked the Ontario IPC whether it could apply facial recognition software to its security cameras to spot the self-excluded gamblers as they approach the casino. Cavoukian was concerned, however, that this would involve the casinos building a database of customers’ faces that could be stolen or reused for other purposes.

“The only way I was going to let them do it was if they used biometric encryption,” she says. “It’s a completely different form of recognition programme that doesn’t retain an actual image of the face. Instead, it uses the facial biometric as the encryption key for some other data” – in this case a unique ID that is linked to each banned gambler’s file.

“If the police come knocking with a court order and you have to open the database, they get nothing, because the only way you can decrypt the information is if the individual shows up,” Cavoukian says. “The individual’s face is the tool of encryption.”

Not every business will need a system as sophisticated as this, but if they are to derive maximum value from their data without fear of breaking the law or alienating customers, all businesses need to consider privacy at the point of data collection, Cavoukian argues.

“Companies need to think about embedding privacy into their systems at the point of entrance, when the volumes of data are being generated,” Cavoukian says. “They need to develop new products that give the consumer control and understanding. They will then be far more likely to give the consent that companies need to use their data for secondary purposes that were never contemplated at the time of the initial collection.”

Beatrice Bartlay

Beatrice Bartlay founded 2B Interface, a temporary and permanent staffing agency in 2005 and has since been serving the UK recruitment sector with specialised services. With more than ten years’ experience... More by Beatrice Bartlay

Pseudonymisation in the NHS and casinos

Beatrice Bartlay

Related Topics

Related Stories

How do you build an adaptable data platform?

Charting the AI-fuelled evolution of embedded analytics

Data maturity and the squeezed middle – the challenge of going from good to great

How to stop data mesh turning into a data mess

Related Stories

How do you build an adaptable data platform?

Charting the AI-fuelled evolution of embedded analytics

Data maturity and the squeezed middle – the challenge of going from good to great

Looking at the Earth with fresh eyes