DALLAS – Feb. 29, 2024 – UT Southwestern Medical Center researchers have developed an artificial intelligence (AI) method that writes its own algorithms and may one day operate as an “automated scientist” to extract the meaning behind complex datasets.

“Researchers are increasingly employing AI and machine learning models in their work, but with the huge caveat that these high-performing models provide limited new direct insights into the data,” said Milo Lin, Ph.D., Assistant Professor in the Lyda Hill Department of BioinformaticsBiophysics, and the Center for Alzheimer’s and Neurodegenerative Diseases at UT Southwestern. “Our work is the first step in allowing researchers to use AI to directly convert complex data into new human-understandable insights.”

Dr. Lin co-led the study, published in Nature Computational Science, with first author Paul J. Blazek, M.D., Ph.D., who worked on this project as part of his thesis work while he was at UTSW.

In the past several years, the field of AI has seen enormous growth, with significant crossover from basic and applied scientific discovery to popular use. One commonly used branch of AI, known as neural networks, emulates the structure of the human brain by mimicking the way biological neurons signal one another. Neural networks are a form of machine learning, which creates outputs based on input data after learning on a “training” dataset.

Although this tool has found significant use in applications such as image and speech recognition, conventional neural networks have significant drawbacks, Dr. Lin said. Most notably, they often don’t generalize far beyond the data they train on, and the rationale for their output is a “black box,” meaning there’s no way for researchers to understand how a neural network algorithm reached its conclusion. This study was supported by UTSW’s High Impact Grant Program, which was initiated in 2001 and supports high-risk research offering high potential impact in basic science or medicine.

Seeking to address both issues, the UTSW researchers developed a method they call deep distilling. Using limited training data – datasets used to train machine learning models – deep distilling automatically discovers algorithms, or the “rules” to explain observed input-output patterns in the data. This is done by training an essence neural network (ENN), previously developed in the Lin Lab, on input-output data. The parameters of the ENN that encode the learned algorithm are then translated into succinct computer codes so users can read them.

The researchers tested deep distilling in a variety of scenarios in which traditional neural networks cannot produce human-comprehensible rules and have poor performance in generalizing to very different data. These included cellular automata, in which grids contain hypothetical cells in distinct states that evolve over time according to a set of rules – often used as model systems for emergent behavior in the physical, life, and computer sciences. Although the grid used by the researchers had 256 possible sets of rules, deep distilling was able to “learn” the rules for accurately predicting the hypothetical cells’ behavior for every set of rules after seeing only grids from 16 rule sets, summarizing all 256 rule sets in a single algorithm.

In another test, the researchers trained deep distilling to accurately classify a shape’s orientation as vertical or horizontal. Although only a few training images of perfectly horizontal or vertical lines were required, this method was able to apply the succinct algorithm it discovered to accurately solve much more ambiguous cases, such as patterns with multiple lines or gradients and shapes made of boxes as well as zigzag, diagonal, or dotted lines.

Eventually, Dr. Lin said, deep distilling could be applied to the vast datasets generated by high-throughput scientific studies, such as those used for drug discovery, and act as an “automated scientist” – capturing patterns in results not easily discernible to the human brain, such as how DNA sequences encode functional rules of biomolecular interactions. Deep distilling also could potentially serve as a decision-making aid to doctors, offering insights on its “thought process” through the generated algorithms, he added.

This study was supported by UTSW’s High Impact Grant Program, which was initiated in 2001 and supports high-risk research offering high potential impact in basic science or medicine.

About UT Southwestern Medical Center  

UT Southwestern, one of the nation’s premier academic medical centers, integrates pioneering biomedical research with exceptional clinical care and education. The institution’s faculty members have received six Nobel Prizes and include 25 members of the National Academy of Sciences, 21 members of the National Academy of Medicine, and 13 Howard Hughes Medical Institute Investigators. The full-time faculty of more than 3,100 is responsible for groundbreaking medical advances and is committed to translating science-driven research quickly to new clinical treatments. UT Southwestern physicians provide care in more than 80 specialties to more than 120,000 hospitalized patients, more than 360,000 emergency room cases, and oversee nearly 5 million outpatient visits a year.

Journal Link: Nature Computational Science