Computer Program "Sees" beyond 3-D to Save Lives

ALBUQUERQUE, N.M. ó Data classification, often considered a humdrum task, really is no such thing when the stakes are high enough.

Quick, errorless identifications are needed of chemicals released on a battlefield, explosives found in an airport, or substances collected on interplanetary explorations where no options exist for astronomers to have a second look. Physicians analyzing complicated medical images and certain environmental analyses also require accurate, quick answers.

Thatís why a sophisticated new data classification scheme is being incorporated into the design of Sandia National Laboratoriesí hand-held ìlab-on-a-chipî chemical sensor system.

The classification method ó based on human perception rather than mathematical equations ó is so simple that it can be hard to grasp for those who expect complexity.

It is based upon the human ability to visually group real-world objects seen near each other, says the techniqueís principal developer Gordon Osbourn, a physicist at the Department of Energyís Sandia Labs.
"In the area of visual perception, no computer has ever matched a biological system ó for example a dogís or a two-year-oldís,î says Osbourn.

The dumbbell factor

A person groups by subconsciously superimposing over any two points an invisible shape that resembles a dumbbell, Osbourn says. The subconscious mind sizes the dumbbell so that each bell centers on a point. If no other point intrudes in that space, one considers the two points a group. Itís that simple.

But while biological visual systems are limited to analyzing two-dimensional plots or 3-D patterns, Osbournís system offers the opportunity to ìseeî in many ìdimensions,î in effect cross-analyzing patterns among many data sets. While this sounds complicated, all that is happening is that the same empirical judgments made by human eyes are made by a computer program to judge closeness among the points of many groups of data. The relations between data may be too complex for a human to see, but the same empirical process used in human decision-making is followed in the computer program.

ìWe discovered a way to capture in a software model the way human judgments empirically group patterns in two-dimensional plots or 3-D patterns, so that these judgments can be mathematically applied to high-dimensional data,î says Osbourn.

Because the technique is based on observation of how people empirically group objects they see, it is called VERI, for Visual-Empirical Region of Influence. A patent is expected to be issued in 1999.

Connecting the dots

To demonstrate a relationship between data points, the ìdumbbellî program draws lines that, in effect, connect the dots, sometimes in surprising ways.

Says Daniel Carr, a George Mason University professor of applied and engineering statistics who has an interest in visualizing high-dimensional data, ìI was using [a different algorithm] to do a grand tour [of gene-expression clustering]. Then I added the connecting lines produced by the VERI algorithm, and the thing seemed to come alive. I saw lots of patterns. I showed it at a conference. There was no question that you suddenly saw structure that wasnít previously obvious. This is a very effective algorithm.î

One way to visualize the system is to imagine yourself in a spaceship observing a number of oddly shaped solar systems, which may be very close to, intertwine with, or even wrap around each other. Each solar system represents a chemical; each planet in a system represents a data point from that chemical. The incoming chemical to be analyzed enters as a newly observed planet. If this planet resides in any of the solar systems, then it is identified as an example of that system. However, if the newly observed planet resides outside of all the known solar systems, then it is labeled an unknown chemical. The method does more than simply measure which solar system is closest to a newly observed planet ó it automatically ìseesî if the planet is close enough to any solar system to belong in it.

A paper, ìVERI Pattern Recognition Applied to Chemical Microsensor Array Selection and Chemical Analysis,î was published in the American Chemical Societyís Accounts of Chemical Research, Vol. 31, No. 5, 1998, with Sandia researchers John Bartholomew, Tony Ricco, and Greg Frye as co-authors with Osbourn. Osbourn has also recently presented the VERI concept at an invited talk at a Gordon Conference and at Optical Society of America and American Chemical Society meetings.

Osbourn himself has credentials in complex matters. He is credited with creation of strained-layer superlattices, a novel material used in semiconductor lasers. He won the 1993 American Physical Societyís International Prize for new materials and the Department of Energyís prestigious E.O. Lawrence Award in 1985.

Deriving the VERI Dumbbell template

While the fashion of the last half-century has been to consider human perceptions untrustworthy ó in a famous movie, people who see the same murder each describe it differently ó Osbourn says society actually is based on the commonality of human perception. Everyone with normal vision recognizes large nearby buildings as such, and is expected to similarly perceive ìSTOPî signs or traffic lights and react accordingly.

Working from that commonsense basis, Osbourn and colleagues tested 12 subjects (Sandia employees who ranged from 20 to 50 years old) over a five-year period to determine how they grouped points scattered on a graph. Without exception, the responses of the subjects to complex dot patterns could be predicted from their responses to simple, three-dot patterns. The subjects reacted as if putting an invisible shape resembling a dumbbell around each pair of points. Each pair grouped together only if all other points ó any potential third point ó were outside the invisible dumbbell ó a shape that operated as a region of influence. The researchers found that groupings among many dots are built up one pair at a time.

Because each subset of three points in a complex high-dimensional data set can also be tested in the same way with the dumbbell, the VERI method extends and automates human cluster perception for use in complex data analysis problems. ìThis approach provides a new way to think about automated pattern recognition,î says Osbourn. ìThe VERI clusterings provide a direct and automated decision for whether multidimensional patterns match or are distinct.î

The VERI method is a powerful new alternative to conventional mathematical approaches, especially for high-consequence applications. One advantage of VERI is the ability to ìseeî patterns in data with arbitrarily complex distribution shapes. ìConventional methods often require that real-world data look like widely-separated, compact Gaussian distributions to work properly, yet modern chemical sensor arrays can produce data for different chemicals that resemble intertwined snakes or tangles of spaghetti,î says Osbourn.

ìThis distinction is important for avoiding false alarms in unexpected or uncontrolled field conditions, for example alarms triggered by diesel fumes or fertilizer in sensor systems that are intended to alert soldiers to chemical warfare attacks,î he says. ìVERI provides a complete treatment of the complexity of real-world data distributions that may be essential when human lives are at stake. Another advantage of VERI is that it can discover the simplest and most effective set of sensors to use in the design of hand-held sensor systems ó the so-called electronic noses.î

In a benchmark study published in the Journal of Pattern Recognition in 1995 with Sandia researcher Rubel Martinez, Osbourn says, ìWe culled 25 patterns in computer science literature that caused problems for clustering algorithms. Ours outperformed all commercial clustering algorithms. VERI is one of the best recognizers of cluster patterns because it is based on our discovery of how to quantitatively mimic biological clustering-based pattern recognition performance.î

Works with noisy signals

The computer program works well with imperfect sensor signals coming from what may be an electronically ìnoisyî field to pinpoint a gasís identity. Sometimes the gasís characteristics may not be located in the programís library of known substances ó that is, the gas is unknown ó yet the classification system is savvy enough not to create a false alarm by assigning it a category that happens to be the closest match.

VERI also minimizes power needs, as well as the size and weight, of a hand-held unit by informing its users of the smallest number of sensors necessary for a particular job. It also works well with imperfect signals ó perhaps degraded because the sensor aged ó coming from the field.

ìThe system deals well with false alarms,î says Sandia sensor researcher Tony Ricco. ìIf something is unknown and has never been calibrated, it will tell you itís unknown. You donít want false alarms when youíre screening for explosives. When [the classification system] sees a new chemical species itís never seen before, it just says itís unknown. And itís likely that in planetary explorations, youíll come across combinations of chemicals youíve never seen before. False alarms when soldiersí lives are at stake on a battlefield are unacceptable.î

Sandia is a multiprogram DOE laboratory, operated by a subsidiary of Lockheed Martin Corp. With main facilities in Albuquerque, N.M., and Livermore, Calif., Sandia has major research and development responsibilities in national security, energy, and environmental technologies.

#

Visuals and story available at http://www.sandia.gov/media/VERI.htm
More details of the VERI method can be found at the web site http://www.sandia.gov/1100/1155Web/1155home.htm.
Media contact: Neal Singer, 505-845-7078, [email protected]
Technical contact: Gordon Osbourn, 505-844-8850, [email protected]

MEDIA CONTACT
Register for reporter access to contact details