Newswise — New data standards created for AI models.

Aspiring bakers are frequently called upon to adapt award-winning recipes based on differing kitchen setups. Someone might use an eggbeater instead of a stand mixer to make prize-winning chocolate chip cookies, for instance.

Being able to reproduce a recipe in different situations and with varying setups is critical for both talented chefs and computational scientists, the latter of whom are faced with a similar problem of adapting and reproducing their own ​“recipes” when trying to validate and work with new AI models. These models have applications in scientific fields ranging from climate analysis to brain research.

“When we talk about data, we have a practical understanding of the digital assets we deal with,” said Eliu Huerta, scientist and lead for Translational AI at the U.S. Department of Energy’s (DOE) Argonne National Laboratory. ​“With an AI model, it’s a little less clear; are we talking about data structured in a smart way, or is it computing, or software, or a mix?”

In a new study, Huerta and his colleagues have articulated a new set of standards for managing AI models. Adapted from recent research on automated data management, these standards are called FAIR, which stands for findable, accessible, interoperable and reusable.

“By making AI models FAIR, we no longer have to build each system from the ground up each time,” said Argonne computational scientist Ben Blaiszik. ​“It becomes easier to reuse concepts from different groups, helping to create cross-pollination across teams.”

According to Huerta, the fact that many AI models are currently not FAIR poses a challenge to scientific discovery. ​“For many studies that have been done to date, it is difficult to gain access to and reproduce the AI models that are referenced in the literature,” he said. ​“By creating and sharing FAIR AI models, we can reduce the amount of duplication of effort and share best practices for how to use these models to enable great science.”

To meet the needs of a diverse community of users, Huerta and his colleagues combined a unique suite of data management and high performance computing platforms to establish a FAIR protocol and quantify the ​“FAIR-ness” of AI models. The researchers paired FAIR data published at an online repository called the Materials Data Facility, with FAIR AI models published at another online repository called the Data and Learning Hub for Science, as well as with AI and supercomputing resources at the Argonne Leadership Computing Facility (ALCF). In this way, the researchers were able to create a computational framework that could help bridge various hardware and software, creating AI models that could be run similarly across platforms and that would yield reproducible results. The ALCF is a DOE Office of Science user facility.

Two keys to creating this framework are platforms called funcX and Globus, which allow researchers to access high performance computing resources straight from their laptops. ​“FuncX and Globus can help transcend the differences in hardware architectures,” said co-author Ian Foster, director of Argonne’s Data Science and Learning division. ​“If someone is using one computing architecture and someone else is using another, we now have a way of speaking a common AI language. It’s a big part of making AI more interoperable.”

In the study, the researchers used an example dataset of an AI model that used diffraction data from Argonne’s Advanced Photon Source, also a DOE Office of Science user facility. To perform the computations, the team used the ALCF AI Testbed’s SambaNova system and the Theta supercomputer’s NVIDIA GPUs (graphics processing units).

“We’re excited to see the FAIR productivity benefits from model and data sharing to provide more researchers with access to high performance computing resources,” said Marc Hamilton, NVIDIA vice president for Solutions Architecture and Engineering. ​“Together we’re supporting the expanding universe of high performance computing that’s combining experimental data and instrument operation at the edge with AI to increase the pace of scientific discovery.”

“SambaNova is excited to partner with researchers at Argonne National Laboratory to pursue innovation at the interface of AI and emergent hardware architectures,” added Jennifer Glore, vice president for Customer Engineering at SambaNova Systems. ​“AI will have a significant role in the future of scientific computing, and the development of FAIR principles for AI models along with novel tools will empower researchers to enable autonomous discovery at scale. We’re looking forward to continued collaboration and development at the ALCF AI Testbed.”

A paper based on the study, ​“FAIR principles for AI models, with a practical application for accelerated high energy diffraction microscopy,” appeared in Nature Scientific Data on Nov. 10, 2022.

In addition to Huerta, other authors of the study include Argonne’s Nikil Ravi, Pranshu Chaturvedi, Zhengchun Liu, Ryan Chard, Aristana Scourtas, K.J. Schmidt, Kyle Chard, Ben Blaiszik and Ian Foster.

The research was funded by DOE’s Office of Advanced Scientific Computing Research, the National Institutes of Standards and Technology, the National Science Foundation and Laboratory Directed Research and Development grants.

The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines. Supported by the U.S. Department of Energy’s (DOE’s) Office of Science, Advanced Scientific Computing Research (ASCR) program, the ALCF is one of two DOE Leadership Computing Facilities in the nation dedicated to open science.

About the Advanced Photon Source

The U. S. Department of Energy Office of Science’s Advanced Photon Source (APS) at Argonne National Laboratory is one of the world’s most productive X-ray light source facilities. The APS provides high-brightness X-ray beams to a diverse community of researchers in materials science, chemistry, condensed matter physics, the life and environmental sciences, and applied research. These X-rays are ideally suited for explorations of materials and biological structures; elemental distribution; chemical, magnetic, electronic states; and a wide range of technologically important engineering systems from batteries to fuel injector sprays, all of which are the foundations of our nation’s economic, technological, and physical well-being. Each year, more than 5,000 researchers use the APS to produce over 2,000 publications detailing impactful discoveries, and solve more vital biological protein structures than users of any other X-ray light source research facility. APS scientists and engineers innovate technology that is at the heart of advancing accelerator and light-source operations. This includes the insertion devices that produce extreme-brightness X-rays prized by researchers, lenses that focus the X-rays down to a few nanometers, instrumentation that maximizes the way the X-rays interact with samples being studied, and software that gathers and manages the massive quantity of data resulting from discovery research at the APS.

This research used resources of the Advanced Photon Source, a U.S. DOE Office of Science User Facility operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.

The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://​ener​gy​.gov/​s​c​ience.

Journal Link: Nature Scientific Data, Nov-2022