Newswise —

In 1918, Irving Langmuir, a renowned American chemist, authored a groundbreaking paper that delved into the behavior of gas molecules adhering to a solid surface. Through meticulous experimentation and informed by his theory proposing that solids provide distinct sites for gas molecules to occupy, Langmuir developed a set of equations that elucidate the amount of gas that will adhere, based on the pressure applied.

In present times, nearly a century after Irving Langmuir's groundbreaking work, a team of researchers from IBM Research, Samsung AI, and the University of Maryland, Baltimore County (UMBC) has developed an "AI scientist" that has successfully replicated a pivotal aspect of Langmuir's Nobel Prize-winning research. This artificial intelligence system has also rediscovered Kepler's third law of planetary motion, which can compute the orbital period of one celestial body around another based on their distance apart. Furthermore, the AI scientist has generated a reliable approximation of Einstein's relativistic time-dilation law, which elucidates the phenomenon of time slowing down for objects that are moving at high speeds.

The research was made possible with the support of the Defense Advanced Research Projects Agency (DARPA), and the findings of this groundbreaking study will be published in the esteemed journal Nature Communications on April 12th. The paper will detail the remarkable results achieved by the AI scientist, showcasing its ability to reproduce Langmuir's work, rediscover Kepler's third law, and generate an approximation of Einstein's relativistic time-dilation law.

A machine-learning tool that reasons

The newly developed AI scientist, affectionately nicknamed "AI-Descartes" by the researchers, joins the ranks of other cutting-edge computing tools like AI Feynman, all of which are designed to accelerate the pace of scientific discovery. These systems are built on the foundation of symbolic regression, a powerful concept that involves finding equations that best fit the data. By utilizing fundamental mathematical operators such as addition, multiplication, and division, these AI systems are capable of generating numerous candidate equations, ranging from hundreds to millions, in search of the most accurate equations that can effectively describe the underlying relationships within the data. This innovative approach holds great potential in advancing scientific research and unlocking new insights in various fields.

According to Cristina Cornelio, the lead author of the paper and a research scientist at Samsung AI in Cambridge, England, AI-Descartes boasts several advantages over other systems, with its most notable feature being its ability to engage in logical reasoning. When faced with multiple candidate equations that accurately fit the data, the system has the capability to identify which equations align best with established scientific theories. This ability to reason sets AI-Descartes apart from other "generative AI" programs, like ChatGPT, whose language models may lack advanced logical skills and may occasionally make mistakes in basic mathematical calculations. The incorporation of logical reasoning in AI-Descartes enhances its accuracy and reliability in scientific data analysis and modeling.

Cristina Cornelio explains that their work with AI-Descartes involves a unique merging of two approaches: the first-principles approach that has long been employed by scientists to derive formulas from established theories, and the data-driven approach that is prevalent in the era of machine learning. This combination of approaches allows for leveraging the strengths of both methodologies, resulting in the creation of more accurate and meaningful models with diverse applications. By incorporating both traditional scientific principles and cutting-edge data-driven techniques, AI-Descartes is poised to contribute to advancements in various fields by providing robust and reliable models for data analysis and prediction.

The name AI-Descartes is a nod to 17th-century mathematician and philosopher René Descartes, who argued that the natural world could be described by a few fundamental physical laws and that logical deduction played a key role in scientific discovery.

Suited for real-world data

One of the notable strengths of AI-Descartes is its ability to effectively handle noisy and real-world data, which can pose challenges for traditional symbolic regression programs. Unlike conventional programs that may struggle to discern the underlying signal amidst the noise and may overly focus on capturing every small variation in the data, AI-Descartes is designed to identify meaningful patterns even in noisy data. This capability allows for more accurate and reliable models, as it prevents overfitting and ensures that the generated equations capture the true underlying trends in the data.

Additionally, AI-Descartes is capable of working well with small data sets, even with as few as ten data points. This makes it a valuable tool for scenarios where data availability is limited, such as in certain scientific domains or emerging fields. The system's ability to generate meaningful equations from small data sets further highlights its versatility and potential for a wide range of applications, including in situations where data may be scarce or noisy.

One factor that might slow down the adoption of a tool like AI-Descartes for frontier science is the need to identify and code associated background theory for open scientific questions. The team is working to create new datasets that contain both real measurement data and an associated background theory to refine their system and test it on new terrain.

They would also like to eventually train computers to read scientific papers and construct the background theory themselves.

Co-author Tyler Josephson, who is an assistant professor of Chemical, Biochemical, and Environmental Engineering at UMBC, explains, "In this work, we relied on human experts to formalize the axioms of the background theory in a computer-readable format. If any of these axioms were missed or incorrect, it would hinder the system's functionality." He further adds, "In the future, we aim to automate this process as well, allowing us to expand our exploration to various domains of science and engineering."

The statement highlights the crucial role of human expertise in developing the formal background theory that serves as the foundation for the system's operations. The accuracy and comprehensiveness of the axioms are vital to ensure the effectiveness and reliability of the system. However, the team's future goal is to further streamline and automate this step, enabling the system to delve into a broader range of scientific and engineering domains. This reflects the continuous pursuit of advancing technology to enhance and expand the capabilities of scientific research and exploration.

This goal motivates Josephson’s research on AI tools to advance chemical engineering. 

The team envisions that their AI-Descartes, much like its human namesake, will inspire a novel approach to scientific inquiry. "One of the most thrilling prospects of our work is the potential to drive significant advancements in scientific research," notes Cornelio, a team member.

The team's AI-Descartes represents a promising frontier in the realm of artificial intelligence, offering new avenues for scientific exploration and discovery. By leveraging the capabilities of advanced technologies, such as machine learning and formal reasoning, the team aims to push the boundaries of scientific inquiry and unlock new insights. The potential impact of their work extends beyond the realm of AI, with the goal of inspiring and catalyzing breakthroughs in various fields of scientific research.

Journal Link: Nature Communications