Researchers flag privacy risks with de-identified health data


Data breach, cybersecurity, hacking,

Data breach, cybersecurity, hacking,

A growing number of hospitals are banding together with tech companies to create analytics businesses, or develop predictive algorithms.

These efforts are fueled by de-identified data, which gives hospitals and other covered entities the ability to share patient data without specifically asking for their consent. Patients’ names, addresses, and other potentially identifying information are removed from these datasets, which can then be shared freely under current regulations.

Even if the privacy risks to patients in sharing de-identified data might seem minute or distant, hospitals should carefully consider them when they strike data-sharing agreements, researchers wrote in an article recently published in the New England Journal of Medicine. They advocated for specific protections for patients, including seeking patients’ consent, stepping up security measures for de-identified data, and additional legislation that would protect patients in the event of a breach.

“I think the challenge in medicine is everything is benefit-risk. It’s really easy for people to imagine the benefits, and really difficult to imagine the risks,” said Eric Perakslis, chief science and digital officer at the Duke Clinical Research Institute, and co-author of the article. “Precisely what benefit is being returned to the patients from the centers that are selling their data? If the benefit is 0, then there needs to be 0 risk.”

Big data partnerships draw scrutiny
The use of de-identified data in healthcare is nothing new, but more hospitals are looking to tap into the vast troves of data stored in their electronic record systems. For-profit hospital giant HCA recently struck a partnership with Google to develop prescriptive algorithms based on de-identified data, and Mayo Clinic launched a joint venture with Massachusetts-based startup Nference to commercialize algorithms for the early detection of heart disease.

Recently, 14 large health systems, including Providence and CommonSpirit Health, began pooling together de-identified patient data for analytics, with plans to make some of those datasets available for purchase.

Growing analytics startups, including Komodo and Flatiron Health, have also made a business of analyzing de-identified patient data.

While these efforts could potentially lead to important discoveries, such as predicting who might benefit most from certain cancer treatments, they also haven’t been without controversy. Patients have filed lawsuits in the past based on the use of de-identified data, though so far, none of these attempts have been successful.

In 2012, a CVS pharmacy customer sued the company based on the alleged sale of de-identified information about prescription fillings, medical history and diagnoses. More recently, a patient filed a lawsuit against Google and the University of Chicago Medical Center over their data sharing partnership, though a judge dismissed it on the basis that the patient failed to demonstrate harm as a result of the partnership.

Even though the plaintiffs didn’t succeed in these two cases, that doesn’t mean that more lawsuits won’t appear in the future.

“Plaintiffs rarely give up easily, so I don’t think this is over yet,” said Patricia Carreiro, a cybersecurity and privacy litigator with Carlton Fields. “People are becoming more aware of their privacy and wanting to protect it.”

Currently, there are no known cases of re-identification. But it’s still a possibility when large datasets are combined, or when genomic data is compared to people who have taken consumer DNA tests.

On top of that, healthcare identity theft is becoming more prominent, as seen with the recent spate of ransomware attacks targeting hospitals and insurers. As more breaches happen, it becomes more difficult to identify the cause of any particular breach, Carreiro said.

According to the Center for Victim Research, it cost healthcare identity theft victims an average of $13,500 to resolve the crime.

“If you’re a hospital administrator thinking about doing one of these data deals, you should think about that,” Perakslis said.

Two methods for de-identification
Currently, under the Health Insurance Portability and Accountability Act (HIPAA), there are two methods for de-identifying patient data.

The first, the safe harbor method, involves removing 18 types of identifiers, including patients’ names, addresses, emails, birthdays and social security numbers. A person’s gender, birthday and ZIP code are often enough to identify most Americans. 

The second, the expert determination method, involves working with a statistical expert to come up with a process that would have a very low risk of identification. In analytics partnerships, healthcare companies often look to the latter, said Adam Greene, a partner with Davis Wright Tremaine LLP.

“One of the challenges with de-identification is that under the safe harbor method is you can’t have a unique identifier with limited exception,” he said. “It becomes difficult to link an individual across different datasets, even if you can’t identify who that individual is. It also becomes more difficult to identify what may be important analytic information.”

One concern is that there’s no absolute standard for what methods should be used with expert determination. For example, if encryption is used to de-identify data, it may not be future proof, introducing a risk for data being linked or breached several years down the line, said Kenneth Mandl, co-author of the NEJM article and director of computational health informatics at Boston Children’s Hospital.

Hospitals should also consider that while using de-identified data is legal, these uses don’t always align with patients’ expectations.

“How do you feel about the activity finding its way onto the front page of a prominent media outlet?” Greene said. “Just because it’s legally permissible doesn’t mean it couldn’t have reputational impact.”

Potential solutions
Some potential solutions include consenting patients on how their de-identified data is used, and better monitoring of where that data goes when it is shared. For example, the authors of the NEJM article said hospitals and other covered entities should treat deidentified health data similarly to how they would handle protected health information. Patients should be notified, using consent documents and privacy notices, that their data may be used to support a health system or be shared with commercial parties.

When patients’ de-identified data is shared, hospitals should implement contractual controls to ensure that data never passes beyond the users specified in the arrangement, and that they cannot link it to other datasets or re-identify that data without the permission of the provider. Federated systems, which allow organizations to run analytics on data without it ever leaving the original site, also serve as a potential solution here.

“There are opportunities to maintain an understanding of how those data are going to be used subsequently so the deidentified datasets can continue to be monitored and uses can be audited if there is a breach or reidentification event,” Mandl said. “There is a way to have the originating institution aware that it happened.”

Finally, in the event of the worst-case scenario, privacy and anti-discrimination laws serve as an important backstop. Legislation should protect patients’ ability to get health insurance, life insurance and employment.

Currently, two states have provisions prohibiting unauthorized re-identification: California and Texas. The California Consumer Privacy Act also requires that entities include certain contractual restrictions if they sell or otherwise disclose HIPAA de-identified information.

 “If you’re going to do this, give the patients recourse. Let the patients know you’re doing it. Provide liability protection,” Perakslis said. “There’s fascinating stuff in this data, but it has to be responsible.”

Photo credit: JuSun, Getty Images