- 2019-10-04 08:05:14
- Article ID: 720187
Following the Data Trail to Accelerated Discovery
A team of computational scientists, software engineers, and physicists is developing software that will keep track of sample descriptions, experimental conditions, and data analysis methods so scientists can interpret, validate, compare, and reproduce results--and eventually automate their research
This desire to trace origins also extends to data itself. In computer science, the ability to record the origin and history of data is known as provenance.
“Provenance is the record of data lineage and software processes operating on these data that enable the interpretation, validation, and reproduction of results,” explained Line Pouchard, a senior researcher at the Center for Data-Driven Discovery (C3D), part of the Computational Science Initiative (CSI) at the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory.
Currently, Pouchard is leading an effort focused on enhancing scientific experimentation through provenance via a framework called the Dynamic Provenance System (DPS).
For scientific experiments, provenance encompasses descriptions of samples, experimental procedures and conditions, and data analysis methods. The ability to record these metadata and scientific workflows is especially critical in today’s era of big data, where researchers are faced with large, diverse data from complex, dynamic, and streaming heterogeneous sources. In order for scientists to derive insights that lead to discoveries, they need to know how data were generated and transformed from their original state to produce the final results.
An existing data infrastructure
Consider the National Synchrotron Light Source II (NSLS-II)—a DOE Office of Science User Facility at Brookhaven Lab where scientists use ultrabright x-ray light to reveal the atomic and electronic structure, chemical composition, and magnetic properties of materials. By 2020, NSLS-II is expected to produce more than 20 petabytes of experimental data. For reference, 1.5 million CD-ROM discs would be required to store a single petabyte. Currently, 28 experimental stations, or beamlines, are in operation at NSLS-II. That means 28 separate experiments can be ongoing at once. Once fully built out, NSLS-II could accommodate up to 60 beamlines.
Traditionally, each beamline has provided its own data acquisition and analysis tools. But recently, Brookhaven’s Data Acquisition, Management, and Analysis (DAMA) Group developed a collection of software tools named Bluesky to streamline the process of data collection and analysis across different beamlines and even other light sources. This open-source software has also been deployed at three other DOE Office of Science User Facilities: the Advanced Photon Source at Argonne National Lab, the Advanced Light Source at Berkeley National Lab, and the Linac Coherent Light Source at SLAC National Accelerator Lab.
“Bluesky allows us to store the entire data trail, from the birth of the sample and its characteristics—for example, who made it, how and when it was made, its chemical composition, and its physical state—to experimental conditions and instrument calibrations, to the software parameters used to analyze the data, to the tangible scientific results generated by analyzing the raw data,” explained Eli Stavitski, lead scientist at NSLS-II’s Inner Shell Spectroscopy (ISS) beamline. “In order to support reproducible science, all of these metadata have to be tracked by a provenance system.”
Provenance for photon science
To help boost the data and computing infrastructure at NSLS-II, Pouchard’s team has been developing provenance software that leverages Bluesky.
“Our DPS enables searching across the beamlines, provides statistics on samples studied at NSLS-II, and brings external data sources right to scientists’ fingertips during their experiments,” Pouchard explained. “These sources include crystallography databases and information extracted from the scientific literature.”
In addition to Pouchard and Stavitski, the team members are C3D computational scientist Pavol Juhas and postdoctoral research associate Gilchan Park; application architect Hubertus van Dam of CSI’s Computational Science Laboratory; DAMA group leader Stuart Campbell; Simon Billinge, a physicist in Brookhaven’s Condensed Matter Physics and Materials Science Department and a professor of materials science and engineering and applied physics and applied mathematics at Columbia; CJ Wright, a recent PhD graduate of Columbia and lead software engineer for the Billinge Group; and Aiko Hassett, a computer science major at Middlebury College who participated in DOE’s Science Undergraduate Laboratory Internships program in summer 2019.
The team has deployed a prototype version of this provenance software at two NSLS-II beamlines: ISS and X-ray Powder Diffraction (XPD).
“ISS and XPD are what we call high-throughput beamlines,” said Campbell. “These beamlines are capable of measuring up to hundreds of samples in a single day. While Bluesky provides a framework for storing the metadata from these experiments in databases, we need provenance “dictionaries,” or a set of metadata schemas, that are customized for specific scientific techniques and samples.”
Metadata schemas provide an overall structure for organizing and formatting the metadata in a standardized way. Without well-defined schemas, retrieving relevant information from the database would be difficult.
“The DPS software allows users to search for metadata on samples and previously performed experiments and to replay analysis workflows,” said Pouchard.
For previous generations of light sources, scientists traditionally analyzed their scientific results back at their home institutions. But finding out after the experiment that there was a problem with the experimental setup or the data were insufficient to make any conclusions is not very helpful, especially considering that beam time is limited.
“We would like to enable data analysis while the scientists are doing their experiments at the beamlines so they can make adjustments if necessary,” said van Dam. Such streaming analysis requires computational power and networks, as well as optimized analysis codes.”
“Parallel computing—farming out computations in many nodes—can speed up very-high-intensity calculations,” explained Wright. “In the context of streaming data, executing parallel computing is challenging because the computations from subsequent data points can complete before preceding ones. If the data are not processed in the right order, then the accuracy of the results can be significantly impacted.”
This on-the-fly analysis could enable scientists to compare their results with preexisting data sets to find potential reference structures. Similarly, while the scientific literature contains a wealth of information, it is not very useful if it cannot be parsed in an efficient way that provides relevant answers to very specific questions. According to Pouchard, this problem is why DPS also aims to provide scientists with timely, relevant information extracted from the scientific literature. To help with this task, Park has been developing a text and data mining system based on natural language processing—a branch of artificial intelligence that involves designing computer algorithms that can understand human language.
“Within the past 50 years, thousands and maybe millions of experimental results similar to what we’re doing here at NSLS-II were reported in scientific journals,” said Stavitski. “With the ever-growing number of journals, tracking of all of these papers has become impossible for a human to do. The text and data mining system will not only allow us to more quickly find previously published results for comparison but also to extract pieces of provenance schemas in a form that can be fed into machine learning (ML) algorithms. What comes out from the beamline is a set of numbers, from which figures are created to include in scientific publications. In order for ML to take advantage of these data, the figures need to be transformed back into numbers.”
Toward autonomous experiments
If the DPS prototype targeting a small subset of beamlines proves to be successful, the idea is to extend it throughout the rest of NSLS-II, as well as other facilities where NSLS-II infrastructure has been deployed. According to Stavitski, one of the challenges in a facility-wide implementation within NSLS-II is making schemas that are universally applicable across all beamlines.
“What we need is for the schemas to work like Lego pieces that for a particular type of experiment on a particular beamline, you select the ones that make sense for that technique and sample type,” said Stavitski. “As you move to a different sample or technique, you select a different set of schemas.”
As Campbell notes, this flexibility would be particularly helpful for automated multimodal analysis.
“When multiple measurements with different beamlines are taken on the same sample, future automated analysis would be able to parse these data to provide a more complete picture of the science of that sample,” explained Campbell.
“With our ability to capture data using so many different (but complementary) techniques, we have to delegate some of the analysis to intelligent computer systems,” said Stavitski. “It is no longer feasible for a human to keep up. But machine-based analysis is only possible if every piece of information has been tracked.”
"Because most of ML is done on labeled, structured data, you need to have databases filled with high-quality metadata,” added Billinge. “So, for example, if you have a bunch of x-ray diffraction images stored in a database, you can do ML by training the algorithm to recognize images that are coming from a liquid versus crystal sample. This is where a computational infrastructure becomes important because it enables the capture of the metadata needed for ML. We haven’t had that in place until now.”
Now that Pouchard’s team has demonstrated the generalizability of the DPS on two different NSLS-II beamlines, they would now like to show that it is scalable. Pouchard noted that such scalable provenance systems are needed across many domains, including in beamline experiments, computational experiments, and performance analyses of computer systems. Part of this scalability will involve developing provenance-tracking capabilities for combined experimental and simulation data. Simulations can complement experiments by predicting what to expect and revealing the atomic-scale processes underlying observations—for example, the changes in the chemistry of electrode materials that occur as a battery charges and discharges.
“Another layer of complexity is involved because there are completely different data sets (measured versus calculated), each with varying degrees of accuracy,” said van Dam. “We need to figure out how to compare them in a valid way. One of the issues is computational reproducibility in ML. Computational reproducibility covers a whole range of thresholds, including accuracy and uncertainties. When you do ML with scientific data, the methods are not deterministic; the outputs of the computations are based on the parameters you have chosen.”
The team is also interested in exploring whether a similar provenance system can be used at other facilities with extremely high data acquisition rates, such as the Center for Functional Nanomaterials—another DOE Office of Science User Facility at Brookhaven.
“Ultimately, provenance will enable better science, as it will increase confidence in the methods used to obtain a particular result,” said Pouchard. “It will also enable researchers to reproduce results and help them learn from past experiments.”
The research is funded by Brookhaven’s Laboratory Directed Research and Development program, which promotes highly innovative and exploratory projects.
Brookhaven National Laboratory is supported by the U.S. Department of Energy’s Office of Science. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.
Follow @BrookhavenLab on Twitter or find us on Facebook.

MORE NEWS FROM
Brookhaven National LaboratoryParticipating Labs
- DOE Office of Science
- Argonne National Laboratory
- Oak Ridge National Laboratory
- Pacific Northwest National Laboratory
- Iowa State University, Ames Laboratory
- Brookhaven National Laboratory
- Princeton Plasma Physics Laboratory
- Lawrence Berkeley National Laboratory
- Thomas Jefferson National Accelerator Facility
- Fermi National Accelerator Laboratory (Fermilab)
- SLAC National Accelerator Laboratory

Freeze Frame: Scientists Capture Atomic-Scale Snapshots of Artificial Proteins
Scientists at Berkeley Lab are the first to use cryo-EM (cryogenic electron microscopy), a Nobel Prize-winning technique originally designed to image proteins in solution, to image atomic changes in a synthetic soft material.

Argonne Collaboration Shows Benefits of Better Corn Residue Management Strategies
Sustainable corn stover removal can maintain soil carbon stock, according a new Argonne-led study.

Study Sheds Light on the Really Peculiar 'Normal' Phase of High-Temperature Superconductors
Experiments at SLAC and Stanford probe the normal state more accurately than ever before and discover an abrupt shift in the behavior of electrons in which they suddenly give up their individuality and behave like an electron soup.

Scientists devise catalyst that uses light to turn carbon dioxide to fuel
In a recent study from Argonne, scientists have used sunlight and a catalyst largely made of copper to transform carbon dioxide to methanol.

Science Snapshots - microbiome matchmakers, solid-liquid interfaces, undersea earthquakes
Science Snapshots from Berkeley Lab

SLAC scientists invent a way to see attosecond electron motions with an X-ray laser
Researchers at the Department of Energy's SLAC National Accelerator Laboratory have invented a way to observe the movements of electrons with powerful X-ray laser bursts just 280 attoseconds, or billionths of a billionth of a second, long.

Bank on it: Gains in one type of force produced by fusion disruptions are offset by losses in another
Simulations show that halo currents can serve as a proxy for the total force produced by vertical disruptions.

Story Tips from the Department of Energy's Oak Ridge National Laboratory, December 2019
An additively manufactured polymer layer applied to specialized plastic proved effective to protect aircraft from lightning strikes in lab test; injecting shattered argon pellets into a super-hot plasma, when needed, could protect a fusion reactor's interior wall from runaway electrons; ORNL will celebrate the life and legacy of Dr. Liane Russell on December 20.

Scientists find new way to identify, manipulate topological metals for spintronics
A recent study gives researchers an easier way of finding Weyl semimetals and manipulating them for potential spintronic devices.

Big trucks, little emissions
Researchers reveal a new integrated, cost-efficient way of converting ethanol for fuel blends that can reduce greenhouse gas emissions.

University of Kentucky Grant Seeks to Turn Coal Into Carbon Fiber
UK's Center for Applied Energy Research (CAER) has received a $1.8 million U.S. Department of Energy (DOE) grant to transform coal tar pitch into high-value carbon fiber for use in aircraft, automobiles, sporting goods and other high-performance materials.

Six Berkeley Lab Scientists Named AAAS Fellows
Six scientists from the Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) have been named Fellows of the American Association for the Advancement of Science (AAAS).

PPPL is recognized for being green
The U.S. Department of Energy's Princeton Plasma Physics Laboratory was recognized by the U.S. Environmental Protection Agency for its green practices in reducing waste, energy, and water, and transportation, and for green purchasing and electronics recycling.

Dmitri Zakharov Recognized with the 2019 Chuck Fiori Award
The award honors Dmitri Zakharov's contributions to environmental transmission electron microscopy at Brookhaven Lab's Center for Functional Nanomaterials.

Two Argonne projects earn Secretary of Energy Honor Awards
With this year's Nobel Prize in Chemistry awarded for the development of lithium-ion batteries, directors of the Joint Center for Energy Storage Research share perspectives on the future of energy storage.

Argonne teams up with Altair to manage use of upcoming Aurora supercomputer
Argonne National Laboratory and Altair, a global technology company, have created a new scheduling system that will be employed on the Aurora supercomputer.

University of Maryland, Baltimore County wins DOE's 2019 CyberForce Competition(tm)
After a long suspenseful day, University of Maryland, Baltimore County earned the top spot as national winner of the U.S. Department of Energy's CyberForce Competition.

In its 15th year, INCITE advances open science with supercomputer grants to 47 projects
The U.S. Department of Energy's Office of Science announced allocations of supercomputer access to 47 science projects for 2020--awarding 60 percent of the available time on some of the nation's most powerful supercomputers, with the ultimate goal of accelerating discovery and innovation. In 2020, 14 projects will run on Theta and 39 projects on Summit, where six of these projects will receive an allocation on both systems.

ASU solar awards eclipse other universities in latest round of DOE funding
ASU receives $9.8 million in Solar Energy Technologies Office Awards.
DOE to Provide $10 Million for New Research into Ecosystem Processes
The U.S. Department of Energy (DOE) announced a plan to provide $10 million for new observational and experimental studies aimed at improving the accuracy of today's Earth system models. Research will focus on three separate types of environments--terrestrial, watershed, and subsurface--where current models fall short of providing fully accurate representation.

Harvesting Energy from Light using Bio-inspired Artificial Cells
Scientists designed and connected two different artificial cells to each other to produce molecules called ATP (adenosine triphosphate).

Engineering Living Scaffolds for Building Materials
Bone and mollusk shells are composite systems that combine living cells and inorganic components. This allows them to regenerate and change structure while also being very strong and durable. Borrowing from this amazing complexity, researchers have been exploring a new class of materials called engineered living materials (ELMs).

Excavating Quantum Information Buried in Noise
Researchers developed two new methods to assess and remove error in how scientists measure quantum systems. By reducing quantum "noise" - uncertainty inherent to quantum processes - these new methods improve accuracy and precision.

How Electrons Move in a Catastrophe
Lanthanum strontium manganite (LSMO) is a widely applicable material, from magnetic tunnel junctions to solid oxide fuel cells. However, when it gets thin, its behavior changes for the worse. The reason why was not known. Now, using two theoretical methods, a team determined what happens.

When Ions and Molecules Cluster
How an ion behaves when isolated within an analytical instrument can differ from how it behaves in the environment. Now, Xue-Bin Wang at Pacific Northwest National Laboratory devised a way to bring ions and molecules together in clusters to better discover their properties and predict their behavior.

Tune in to Tetrahedral Superstructures
Shape affects how the particles fit together and, in turn, the resulting material. For the first time, a team observed the self-assembly of nanoparticles with tetrahedral shapes.

Tracing Interstellar Dust Back to the Solar System's Formation
This study is the first to confirm dust particles pre-dating the formation of our solar system. Further study of these materials will enable a deeper understanding of the processes that formed and have since altered them.

Investigating Materials that Can Go the Distance in Fusion Reactors
Future fusion reactors will require materials that can withstand extreme operating conditions, including being bombarded by high-energy neutrons at high temperatures. Scientists recently irradiated titanium diboride (TiB2) in the High Flux Isotope Reactor (HFIR) to better understand the effects of fusion neutrons on performance.

Better 3-D Imaging of Tumors in the Breast with Less Radiation
In breast cancer screening, an imaging technique based on nuclear medicine is currently being used as a successful secondary screening tool alongside mammography to improve the accuracy of the diagnosis. Now, a team is hoping to improve this imaging technique.

Microbes are Metabolic Specialists
Scientists can use genetic information to measure if microbes in the environment can perform specific ecological roles. Researchers recently analyzed the genomes of over 6,000 microbial species.
Spotlight

Barbara Garcia: A first-generation college student spends summer doing research at PPPL
Princeton Plasma Physics Laboratory

Argonne organization's scholarship fund blazes STEM pathway
Argonne National Laboratory

Brookhaven Lab, Suffolk Girl Scouts Launch Patch Program
Brookhaven National Laboratory

From an acoustic levitator to a "Neutron Bloodhound" robot, hands-on research inspires PPPL's summer interns
Princeton Plasma Physics Laboratory

Brookhaven Lab Celebrates the Bright Future of its 2019 Interns
Brookhaven National Laboratory

PPPL apprenticeship program offers young people chance to earn while they learn high-tech careers
Princeton Plasma Physics Laboratory

JSA Awards Graduate Fellowships for Research at Jefferson Lab
Thomas Jefferson National Accelerator Facility

ILSAMP Symposium showcases benefits for diverse students, STEM pipeline
Argonne National Laboratory

Integrating Scientific Computing into Science Curricula
Brookhaven National Laboratory
Students from Minnesota and Massachusetts Win DOE's 29th National Science Bowl(r)
Department of Energy, Office of Science
DOE's Science Graduate Student Research Program Selects 70 Students to Pursue Research at DOE Laboratories
Department of Energy, Office of Science

Young Women's Conference in STEM seeks to change the statistics one girl at a time
Princeton Plasma Physics Laboratory

Students team with Argonne scientists and engineers to learn about STEM careers
Argonne National Laboratory

Lynbrook High wins 2019 SLAC Regional Science Bowl competition
SLAC National Accelerator Laboratory

Equipping the next generation for a technological revolution
Argonne National Laboratory

Chemistry intern inspired by Argonne's real-world science
Argonne National Laboratory

Argonne intern streamlines the beamline
Argonne National Laboratory

Research on Light-Matter Interaction Could Lead to Improved Electronic and Optoelectronic Devices
Rensselaer Polytechnic Institute (RPI)

Innovating Our Energy Future
Oregon State University, College of Engineering

Physics graduate student takes her thesis research to a Department of Energy national lab
University of Alabama at Birmingham

"Model" students enjoy Argonne campus life
Argonne National Laboratory

Writing Code for a More Skilled and Diverse STEM Workforce
Brookhaven National Laboratory

New graduate student summer school launches at Princeton Plasma Physics Laboratory
Princeton Plasma Physics Laboratory

The Gridlock State
California State University (CSU) Chancellor's Office

Meet Jasmine Hatcher and Trishelle Copeland-Johnson
Brookhaven National Laboratory

Argonne hosts Modeling, Experimentation and Validation Summer School
Argonne National Laboratory

Undergraduate Students Extoll Benefits of National Laboratory Research Internships in Fusion and Plasma Science
Princeton Plasma Physics Laboratory

Students affected by Hurricane Maria bring their research to SLAC
SLAC National Accelerator Laboratory

Brookhaven Lab Pays Tribute to 2018 Summer Interns
Brookhaven National Laboratory

CSUMB Selected to Host Architecture at Zero Competition in 2019
California State University, Monterey Bay

From Hurricane Katrina Victim to Presidential Awardee: A SUNO Professor's Award-Winning Mentoring Efforts
Brookhaven National Laboratory

Department of Energy Invests $64 Million in Advanced Nuclear Technology
Rensselaer Polytechnic Institute (RPI)

Professor Miao Yu Named the Priti and Mukesh Chatter '82 Career Development Professor
Rensselaer Polytechnic Institute (RPI)

2018 RHIC & AGS Annual Users' Meeting: 'Illuminating the QCD Landscape'
Brookhaven National Laboratory

High-School Students Studying Carbon-Based Nanomaterials for Cancer Drug Delivery Visit Brookhaven Lab's Nanocenter
Brookhaven National Laboratory

Argonne welcomes The Martian author Andy Weir
Argonne National Laboratory

UW Professor and Clean Energy Institute Director Daniel Schwartz Wins Highest U.S. Award for STEM Mentors
University of Washington

Creating STEM Knowledge and Innovations to Solve Global Issues Like Water, Food, and Energy
Illinois Mathematics and Science Academy (IMSA)

Professor Emily Liu Receives $1.8 Million DoE Award for Solar Power Systems Research
Rensselaer Polytechnic Institute (RPI)

Celebrating 40 years of empowerment in science
Argonne National Laboratory

Introducing Graduate Students Across the Globe to Photon Science
Brookhaven National Laboratory

Students from Massachusetts and Washington Win DOE's 28th National Science Bowl(r)
Department of Energy, Office of Science

Q&A: Al Ashley Reflects on His Efforts to Diversify SLAC and Beyond
SLAC National Accelerator Laboratory

Insights on Innovation in Energy, Humanitarian Aid Highlight UVA Darden's Net Impact Week
University of Virginia Darden School of Business

Ivy League Graduate, Writer and Activist with Dyslexia Visits CSUCI to Reframe the Concept of Learning Disabilities
California State University, Channel Islands

Photographer Adam Nadel Selected as Fermilab's New Artist-in-Residence for 2018
Fermi National Accelerator Laboratory (Fermilab)
Showing results
0-4 Of 2215