Synthetic data – A new solution solving historic healthcare privacy challenges

We have become all too accustomed – even desensitized – to reading stories of data breaches, misuse and evocative privacy debates. Malicious cyberattacks with the goal of harvesting information are now par for the course. According to the HIPAA Journal, 62 healthcare-focused data breaches occurred in April of this year alone, exposing the contents of 2.5 million medical records – a number that is historically high but unfortunately eye level with prior months. Combined with the observable and frightening rise in ransomware attacks targeting the sector, the risk to patient data is at an all-time high and healthcare organizations are chronically ill-prepared to combat it. So, it is easy to appreciate the legitimate paranoia that data custodians have towards data management and, along with the restrictions imposed by legislation, the self-imposed restrictive governance severely limiting access to production data.

The reality of keeping sensitive data locked up so tightly is that the possible value it holds is also inaccessible. Organizations know that data facilitates a better understanding of patients and customers, supports informed business decisions and, most critically, can underpin patient care research. Furthermore, there is a growing trend towards the monetization of healthcare data, as pharmaceutical and medical device companies strive for diagnostic and treatment breakthroughs. The ultimate puzzle to solve therefore is having a safe and secure way of enabling the freedom to use data for critical medical research, while at the same time meeting the high governance expectations of the data custodians.

Fortunately, groundbreaking AI technology is emerging that offers a viable solution for organizational leaders and data custodians to share and harvest insights from user data, while still maintaining robust security and granting patients their right to absolute privacy. Using advanced synthetic data engines enables the ethical and risk-free analysis, sharing and even monetization of data. As a result, this empowers healthcare organizations with the autonomy to extract maximum value from the data they possess, opens the door to breakthrough research, enhances every facet of the patient experience and supports more efficient business operations.

What exactly is synthetic data?

Leading edge synthetic data technology can now generate a “twin” dataset that is verifiably highly accurate and statistically equivalent to the production data, but devoid of any private information. The synthetic data produced is significantly more robust than traditional masking and anonymization techniques, as it creates new data values that maintain the precise relationships and distribution of the production data. The insights and analytics presented provide the equivalent results that would have been observed from the original data. More advanced software will also provide analytical tools that measure both the accuracy and privacy of the ‘new’ data, allowing forensic analysis of how the synthetic data was created, for added user security reassurance and governance .

This revolutionary capability allows businesses to share data among internal teams, as well as third parties (often located across multiple jurisdictions) in a manner that goes above and beyond what is required by data privacy legislation. As it never reveals any personally identifiable information, synthetic data allows highly-sensitive and privileged medical information to be transformed into an unprecedented resource for analysis and processing.

Breaking down the barriers to using synthetic data

As with any new technology, educating potential users of the benefits of synthetic data can be a time consuming nut and that should be especially expected when healthcare data is the focus. However, once the capabilities and potential are realized, and backed by measurable privacy and accuracy, it becomes clear that the use of synthetic data should quickly become mainstream.

Clear use-cases to fully understand how the synthetic data will be deployed are an essential component of the decision making process, and although by nature synthetic data isn’t ‘real’, its existence within the data supply chain should still fall under data governance and security policies. Early adopters are largely using synthetic data for internal purposes to support a wider analytical capability or to support artificial intelligence and machine learning. That said, there is a growing trend to take advantage of the freedom synthetic data provides by distributing data to external technology partners to support onboarding and testing. Where small and mid-sized businesses are encouraged to invite innovation into an organization, synthetic data is now being used to incorporate more realism and accuracy into the process. What’s clear is that the use and application of synthetic data will only increase as confidence grows in its robustness and capability, as well as with the proliferation of use cases.

Photo: LeoWolfert, Getty Images