Newswise — The utilization of mobile technologies for gathering and evaluating individuals' location details has generated vast quantities of consumer location data, resulting in a complex multi-billion-dollar framework wherein consumers can exchange personal data for financial advantages. However, privacy hazards persist.

In a recent investigation, scientists employed machine learning to devise and assess a structure that measures individualized privacy perils, conducts personalized data obfuscation, and adapts to diverse risks, utilities, and acceptable levels of risk-utility compromise. The structure surpassed previous models, substantially diminishing privacy risks for consumers while upholding the utility for advertisers.

The research was executed collaboratively by scholars from Carnegie Mellon University (CMU), the University of Virginia, and New York University. The study's findings have been published in the journal Information Systems Research.

Beibei Li, an associate professor of IT and management at CMU's Heinz College and coauthor of the study, highlights that "The global market for location analytics alone is projected to reach $25.5 billion by 2027." He emphasizes that as industries continue to harness the potential of location big data, their study presents a vital framework for striking a balance between privacy risks and data utilities. This framework aims to maintain a secure and self-regulating multi-billion-dollar location ecosystem.

Every day, enormous quantities of mobile location data are being generated through location-based services on smartphones, such as navigation, ride-sharing, and food delivery apps. This data effectively traces consumers' actions, including where they dine and shop, as well as their purchasing preferences. The purpose behind collecting such data is to facilitate commercially valuable applications such as restaurant recommendations, location-based advertising, and market research. Advertisers, gaining access to this data via data aggregators, can make reasonably accurate predictions regarding a consumer's next location with a 25% success rate, as well as predictions about their next activity and timing with a 26% success rate.

However, the act of sharing location data poses significant risks to consumers, as it often involves divulging personally identifiable information, such as names and home addresses. This sensitive data can be exploited by certain advertisers who engage in malicious activities, primarily driven by short-term financial gains. Hence, it becomes imperative for data aggregators to establish a personalized and adaptable framework that effectively balances various types of risks and utilities, catering to the specific needs of different consumers and advertisers. Such a framework would ensure a more secure and responsible handling of location data, safeguarding consumer privacy while still allowing for beneficial data utilization.

In the conducted study, researchers successfully constructed a framework utilizing machine learning techniques. This framework enables the quantification of privacy risks for individual consumers and assesses the utility for advertisers. One notable aspect of the framework is its personalized and adaptable obfuscation scheme. This scheme selectively conceals a subset of the locations visited by a consumer, taking into account their specific personalized suppression parameter in proportion to their risk level. Additionally, the scheme accommodates diverse types of risks, utilities, and acceptable levels, ensuring flexibility in addressing varying preferences and requirements.

For the purpose of testing their framework, the researchers collaborated with a prominent data aggregator that consolidates location data from over 400 widely-used mobile applications, including news, weather, maps, and fitness apps. This aggregation encompasses approximately one-fourth of the U.S. population, all of whom comply with privacy regulations. The data collection spanned five weeks, specifically from September to October 2018, providing a representative sample of the U.S. population. The researchers focused their analysis on a significant metropolitan area within the United States. To validate the framework's efficacy, they examined a dataset consisting of one million trajectories, representing the movement patterns (where and when consumers move) of 40,000 individuals within the selected metropolitan area.

As per the authors, the study's framework acknowledges specific attributes of individual-level location data and surpasses various benchmark methods employed in recent studies.

Utilizing the suggested framework, the authors assert that a data aggregator can successfully mitigate the risk of consumer privacy breaches by implementing personalized data obfuscation. This obfuscation process aims to maintain the utility of the obfuscated data for advertisers. Furthermore, the aggregator has the ability to meet the personalized and varied requirements of both consumers and advertisers by accommodating different types of risks and utilities. The framework allows for a broad range of acceptable levels for specific risks, utilities, and the tradeoff between risk and utility. This flexibility ensures that the framework can cater to diverse demands and strike a balance between privacy and utility in a customized manner.

According to Meghanath Macha, the study leader and a graduate of CMU's Heinz College, "Location-based marketing is rapidly emerging as a key platform for designing marketing campaigns and reaching out to consumers, enhancing both traditional and digital marketing strategies." Macha emphasizes that their framework fills a crucial gap in the field and provides a valuable tool for privacy-conscious practices in big data location-based applications and services. The framework offers a means to strike a balance between privacy risks and the utility derived from data, enabling more responsible and effective utilization of location-based information in marketing endeavors.

The authors acknowledge several limitations of their study. Firstly, the data they utilized lacked individual consumers' demographic information, which would have provided a deeper understanding of privacy concerns. This absence hinders a comprehensive analysis of the interplay between demographics and privacy. Secondly, their framework focused on one-time data sharing between consumers and advertisers, and did not account for more intricate scenarios involving multiple risks or utilities. Additionally, the framework did not address the potential outcomes when an advertiser combines multiple batches or sources of shared data. These limitations indicate areas for further exploration and refinement in future research to enhance the framework's applicability in complex real-world settings.

Journal Link: Information Systems Research