Newswise — New York, NY—June 2, 2011—The Fourth Amendment of the U.S. Constitution protects citizens from unreasonable searches and seizures and, ever since the Supreme Court’s 1967 decision in Katz v. United States, the right to be free of unwanted government scrutiny has been tied to the concept of reasonable expectations of privacy. Researchers at Columbia Engineering and the University of Maryland Francis King Carey School of Law recently published a study in the New York University Journal of Law & Liberty that examines how advances in machine learning technology may change the way courts treat searches, warrants, and privacy issues.

One of the big questions under debate is whether police should be required to get a search warrant before tracking someone’s location. The traditional answer has been “no”: you have no expectation of privacy in public movements. Thirty years ago, the Supreme Court upheld use of a tracking device without a search warrant, but in that case, the device was used for only three days. Would a longer period be different?

To help answer this question, legal academics have developed the “mosaic theory” of the Fourth Amendment, which holds that a large enough collection of data is vastly more revealing than just the individual points. The Court of Appeals for the DC Circuit accepted this theory in Maynard v. United States (2010), holding that long-term surveillance—four weeks, in this case—was a search protected under the Fourth Amendment, that it exposed an “intimate picture of the subject’s life that he expects no one to have—short perhaps of his spouse” and therefore should have had a warrant. One objection to this theory is very practical: how can police draw the line? Why is three days acceptable while four weeks is not?

“Computer science can provide an answer. The basic idea is very simple: when machine learning techniques—the same sorts of tools that let companies like Amazon and Netflix make accurate recommendations, based on your past history—can make accurate enough predictions, you have a mosaic,” says Steven M. Bellovin, Computer Science (CS) Professor at Columbia Engineering, who co-authored the paper with CS Associate Professor Tony Jebara and PhD candidate Sebastian Zimmeck, both at Columbia Engineering; and Renée M. Hutchins, professor of law at the University of Maryland Carey School of Law.

Their study focuses on how technology—in particular, understanding the data compilation and analysis revealed by machine learning—can provide important Fourth Amendment insights, especially when it comes to long-term surveillance and whether a search warrant is necessary.

“One of the things we discovered,” says Bellovin, who was also chief technology officer of the Federal Trade Commission from 2012 to 2013, “is that the threshold is at most a week, probably less. You can now get predications of startling accuracy with remarkably few data points. The scientific literature shows that the intuitive answer is correct. Most people have reasonably consistent movements during the workweek. As it turns out, weekend movements are pretty regular, too. An observer can get a remarkably full picture of someone’s life in just seven days.”

Bellovin met Hutchins, an expert on the Fourth Amendment, when he gave a talk at Maryland Carey Law three years ago on online anonymity and privacy. They discussed location tracking and its ramifications in the legal system, and decided to work together to examine these issues in a collaboration that brings together expertise in computer science, policy, and the law. Bellovin enlisted his colleague, Jebara, who is an expert in machine learning and in making predictions based on location data, and his PhD student Zimmeck, who is also a lawyer.

The team took a close look at the “mosaic theory,” which posits that the whole is more than the sum of its parts, that a series of searches can yield a fuller picture when taken as a whole. “This has been a controversial subject,” says Hutchins, “especially in this new age of big data, and the courts don’t agree about what constitutes a search that requires a warrant. There still are a lot of open questions in the courts about all this. We wanted to combine the lessons of machine learning with the ‘mosaic theory’ and apply the pairing to the Fourth Amendment, and see what we came up with.”

Machine learning is the branch of computer science that studies systems that can draw inferences from collections of data, usually through mathematical algorithms. “We found that machine learning makes it clear that mosaics can be created, and that the duration of investigations is relevant to their substantive Fourth Amendment treatment because duration has a large impact on the accuracy of the predictions,” notes Jebara, who chairs the Institute for Data Sciences and Engineering’s Foundations of Data Sciences Center.

“We now have a better understanding of the value of aggregated data when viewed through a machine learning lens,” adds Bellovin, who chairs the Institute’s Cybersecurity Center. “While reasonable minds may dispute the most suitable minimum accuracy threshold, it’s clear that the collection of data points allowing predictions that exceed selected thresholds should be generally deemed unreasonable searches in the absence of a warrant.”

The researchers also note that any new rules should take into account not only the data being collected but also foreseeable improvements in machine learning technology that will ultimately be brought to bear on it. This includes using future algorithms on older data.

“Certainly in principle, machine learning can tell us if a mosaic exists,” Bellovin observes. “It can also help us draw lines beyond which a mosaic definitely exists, for instance, measuring the degree of intrusiveness, or loss of privacy, for any given set of location points.”

He and Zimmeck plan to conduct more research on developing a privacy metric, one they say that is “mathematically sound, technically useful, and legally relevant.”

“The development of a legal doctrine for location tracking is in its infancy,” adds Hutchins. “It’s essential that the legal and computer science communities work together, and that the law on location tracking continues to keep step with the current state of scientific discovery.”

###