Newswise — Macquarie University researchers have showcased a fresh method for connecting individual records while ensuring privacy. The initial use lies in recognizing instances of uncommon genetic ailments. Countless other potential applications span throughout society.

The findings will be unveiled at the 18th ACM ASIA Conference on Computer and Communications Security in Melbourne on 12 July.

A young boy in the United States, aged five, possesses a mutation in the GPX4 gene, a trait shared with only ten other children globally. This genetic alteration leads to abnormalities in the skeletal and central nervous systems. It is probable that numerous other children affected by the same condition exist, with their records scattered across numerous health and diagnostic databases worldwide. However, due to legal and commercial restrictions, their identities remain undisclosed to us.

Imagine if it were possible to identify and analyze records related to the condition while simultaneously safeguarding privacy. Such a breakthrough has been achieved by researchers from the Macquarie University Cyber Security Hub. This team, comprised of Dr. Dinusha Vatsalan and Professor Dali Kaafar from the University's School of Computing, along with Mr. Sanath Kumar Ramesh, a software engineer and CEO of the OpenTreatments Foundation in Seattle, Washington (who also happens to be the father of the affected boy), has developed a technique to accomplish precisely that.

"I am tremendously thrilled about this research," expresses Mr. Ramesh, whose foundation played a pivotal role in initiating and backing the project. "Understanding the number of individuals affected by a particular condition forms the foundation of economic assumptions. Previously, if it was believed that a condition had only 15 patients, but now, with the inclusion of data from diagnostic testing companies, we discover that there are actually 100 patients, the market size expands significantly."

"Such findings would undoubtedly yield a substantial economic impact. The valuation of companies involved in addressing the condition would experience an upsurge, while product costing would decrease. The way insurance companies handle medical costs would undergo transformation. Diagnostic companies would intensify their focus on the condition, and epidemiological studies could be conducted with greater precision."

According to Professor Kaafar, the process of linking and counting data records may appear straightforward, but in practice, it entails numerous challenges. One of the primary obstacles is the absence of a centralized database due to the rarity of the disease, resulting in records being scattered across various locations worldwide. "In this instance, they exist in hundreds of databases," he explains. "Moreover, from a business standpoint, data is valuable, and the companies that possess it may not be inclined to share it willingly."

Furthermore, there are technical hurdles associated with aligning data that is recorded, encoded, and stored in diverse formats. Additionally, the challenge lies in accurately accounting for individuals who may be double-counted within and across different databases. Moreover, privacy concerns pose an additional layer of complexity. "We are working with extremely sensitive health data," emphasizes Professor Kaafar, highlighting the importance of safeguarding privacy throughout the process.

While personal data is not necessarily required for a basic estimation of the patient count or for epidemiological purposes, it has traditionally been necessary to ensure the uniqueness and linkage of records. Until now, the inclusion of personal data was necessary to establish connections and ensure accurate identification of individual cases.

Dr. Vatsalan and her team employed a technique called Bloom filter encoding with differential privacy. Through a series of carefully crafted algorithms, they introduced intentional noise into the data to the extent that specific details cannot be extracted from individual records. However, this approach still enables the matching and clustering of patterns among records pertaining to the same disease condition. By striking this balance, the technique allows for the preservation of privacy while facilitating the analysis and correlation of relevant data.

To assess the accuracy and effectiveness of their technique, the researchers conducted an evaluation using North Carolina voter registration data. The results demonstrated that their method yielded a minimal error rate while ensuring an exceptionally high level of privacy, even when dealing with extensively corrupted datasets. Comparatively, their technique outperformed existing methods by a significant margin. The findings underscore the superiority and reliability of the approach developed by Dr. Vatsalan and her colleagues.

Apart from its application in detecting and quantifying rare diseases, this research holds immense potential for various other domains. For example, it can be utilized in marketing to gauge awareness of a new product, or in the field of cybersecurity to track the number of unique views on specific social media posts. The versatility of the technique opens up a wide array of possibilities for its implementation in diverse areas, allowing for accurate data analysis and insights across different domains.

While the technique has broad applications, the focus and passion of the Macquarie University researchers lie particularly in its application to rare diseases. Professor Kaafar expresses that there is no greater satisfaction for a researcher than witnessing the technology they have developed making a genuine impact and improving the world. In the context of rare diseases, the significance of the research becomes even more tangible and crucial. The researchers are deeply committed to this cause, recognizing its real-world implications and the potential to bring about substantial positive change.

The OpenTreatment Foundation partly funded the research.

“The Foundation wanted to make this project completely open source from the very beginning,” Dr Vatsalan adds. “So the algorithm we implemented is being published openly.”

The authors will present their research at the 18th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2023) in Melbourne on 12 July.

The paper, Privacy-preserving Record Linkage for Cardinality Counting, is published in the Proceedings of the 2023 ACM Asia Conference on Computer and Communications Securityhttps://dl.acm.org/doi/10.1145/3579856.3590338.

 

Meeting Link: 18th ACM ASIA Conference on Computer and Communications Security