Pacific Northwest National Laboratory

Identifying the Dark Matter of the Molecular World

Scientists tap deep learning to pinpoint metabolites, which are critical to life

Newswise — Imagine that your Facebook feed poses a tantalizing puzzle. You’re presented with a few fragments about a person – eye color, hair color, age, and height – and have just one minute to pick out the person’s name and identity from hundreds of profiles. If you do so, you win $100 million.

But you know only 10 of these people by name. For the others, you have only a paucity of data to work from. Some are young and some are not so young. Some are blond and some are brunette. Some of their names sound familiar but you can’t quite pinpoint how you know them.

This type of scenario – a seemingly impossible task with an enormous payoff – confronts researchers who study metabolomics at the U.S Department of Energy's Pacific Northwest National Laboratory. That’s the study of small molecules that underlie and inform every aspect of our lives, including energy production, the fate of the planet, and our health. 

Scientists estimate that less than 1 percent of small molecules are known. A typical commercially available metabolomics library has maybe 5,000 compounds, but scientists know there are billions more. 

How do they “identify” something about which they know so little? It’s like asking Galileo to identify stars in deep space that were impossible to detect when he used one of the first telescopes more than 400 years ago.

Enter DarkChem, a research project funded by PNNL’s Deep Learning for Scientific Discovery Agile Investment. A team led by Ryan Renslow is bringing artificial intelligence to the table to tackle the vast, unknown landscape of metabolites that bedevil researchers like Tom Metz, who leads PNNL’s metabolomics effort. 

“Right now, we’re just skimming what is potentially knowable and saying goodbye to very interesting data because we can’t identify the vast majority of metabolites that our technology detects,” said Metz. “Deep learning is providing a new way to solve the puzzle.” 

Renslow and colleagues Sean Colby and Jamie Nunez have adopted deep learning principles commonly used in applications like language translation and applied them to this dark matter of the molecular world. 

Early results are noteworthy: The team’s DarkChem network can calculate a key feature of a molecule in milliseconds and with 13 percent fewer errors, compared to 40 hours on a supercomputer running PNNL’s flagship quantum chemistry software, NWChem.

 “We were shocked at how well DarkChem did,” said Renslow.

The network isn’t simply crunching through data to compile results. Rather, the network draws upon artificial intelligence. DarkChem was developed so that it can discover new things that are still unknown to humans. 

Of football and collision cross-section

In this case, the team trained the program to understand and predict a chemical property known as collision cross-section (CCS). While CCS masks as an intimidating scientific acronym, anyone who has watched a football game has seen something like CCS in action.

Picture a ballcarrier smashing through opposing players. A smaller player might have fewer collisions, but when they do collide with an opponent, the effect is different than when a hulk-like Marshawn Lynch goes into beast mode and shakes off several impacts.

You learn a lot about football players by watching them crash into each other.

In the same way, tracking collisions between metabolite ions traveling through a laboratory instrument filled with gas molecules tells scientists a lot about metabolite ion structures – their size, their mass, and other features. CCS is the mathematical measure of that action, and it’s central to unlocking the gas-phase chemical structure – the true “identification” – of a molecule. 

Renslow and his team trained DarkChem to calculate CCS for chemical structures, then turned it loose to make the calculation for more than 50 million compounds – a portion of the library of PubChem. The program solved that task in a snap. 

While that’s a promising step forward, the team is more excited about the implications for all those as-yet-unidentified small molecules.

 The network can run forwards as well as backwards – that is, it can solve a molecule’s CCS and predict other properties, but it can also generate new chemical structures based on the properties one is looking for. For example, Renslow’s team has used DarkChem to put forth several novel chemical structures that have potential for influencing the NMDA receptor, which is involved in memory and other important brain functions.

The network is not simply memorizing data. In fact, the team intentionally adds some numerical fuzziness into the challenges the network faces to keep it from memorizing.

“It’s like teaching a computer to recognize a dog,” said Renslow. “It could simply memorize the picture, but you want the network to be able to recognize a variety of dogs, so you might flip the picture upside down, stretch it a bit, change its colors. You perturb the image so the program is forced to generalize and rely on the knowledge and rules it has learned.”

Teaching the network to learn

To create the network, the team used a form of artificial intelligence called transfer learning, where the network learns from one data set and then applies its knowledge to another data set. The training consisted mainly of three steps:

  1. The program perused more than 50 million known molecules in PubChem, learning the basics of chemistry and how to represent chemical structures mathematically. But the database lacked information about CCS, a crucial measurement for understanding metabolites.
  2. Then, the team exposed DarkChem to a PNNL-developed set of computational CCS data, about 700,000 molecules. This helped train the program about how to link the general information it had learned about chemical structure to CCS.
  3. Finally, the team fine-tuned the network using a small, robust data set of about 1,000 chemical structures whose CCS measurements have been determined through painstaking work in the laboratory.

The ability to calculate CCS for unknown molecules – molecules whose only hint of existence may be one thin line from a mass-spectrometry experiment – adds an important feature to help scientists differentiate one metabolite from another. To shine a light on dark molecular matter. 

“Every dimension you add gives you better resolving power,” said Colby, who is helping scope out other possible molecular features for DarkChem to analyze, such as infrared spectra, fragmentation patterns, and solvent-accessible surface data. 

It’s analogous to honing our ability to identify thousands of acquaintances on Facebook.

 “You can say someone is male and wears glasses,” said Renslow. “But if you can add that he’s 54 years old and drives a red Mercedes, you restrict the candidates. 

“It’s not that much different with metabolites. We keep adding characteristics we can measure, and eventually there is only one molecule in the universe that fits that combination of data,” he added.


# # #




Filters close

Showing results

110 of 3466
Released: 1-Oct-2020 8:20 AM EDT
The GovLab and the IDB bring innovative ideas to Latin American government officials
New York University

The Governance Lab at New York University’s Tandon School of Engineering and the Inter-American Development Bank (IDB) share the results of the first two “Smarter Crowdsourcing in the Age of Coronavirus” online sessions

Released: 1-Oct-2020 8:15 AM EDT
How (and Why) Steak-umm Became a Social Media Phenomenon During the Pandemic
North Carolina State University

A new study outlines how a brand of frozen meat products took social media by storm – and what other brands can learn from the phenomenon.

Released: 1-Oct-2020 5:05 AM EDT
Relationships at home during the COVID-19 pandemic continue to improve, reports USC Center for the Digital Future
USC Annenberg School for Communication and Journalism

In spite of the stress from COVID-19 and stay-at-home restrictions, many Americans continue to say the relationships with their spouses and children have improved during the pandemic, a study by the USC Center for the Digital Future (CDF) has found.

28-Sep-2020 5:20 PM EDT
Leading Argonne Scientists Discuss Latest Research on Cybersecurity During the COVID-19 Pandemic
Argonne National Laboratory

Hear firsthand from two of the U.S. Department of Energy’s Argonne National Laboratory’s scientific leaders how their research provides insight into cyber resilience and cybersecurity to help secure our nation’s information and systems.

Newswise: Computer Model Shows How COVID-19 Could Lead to Runaway Inflammation
Released: 30-Sep-2020 9:05 PM EDT
Computer Model Shows How COVID-19 Could Lead to Runaway Inflammation

New research from the University of Pittsburgh and Cedars-Sinai digs into the question: Why do some people with COVID-19 develop severe inflammation? The study is published in the Proceedings of the National Academy of Sciences.

Newswise: Cardiac Arrest, Poor Survival Rates Common in Sickest Patients with COVID-19
29-Sep-2020 5:05 PM EDT
Cardiac Arrest, Poor Survival Rates Common in Sickest Patients with COVID-19
Michigan Medicine - University of Michigan

Study shows critically ill patients with the novel coronavirus have high rates of cardiac arrest and poor outcomes even after CPR, an effect most strongly seen in older patients.

Newswise: 244463_web.jpg
Released: 30-Sep-2020 3:45 PM EDT
Investigational COVID-19 vaccine well-tolerated, generates immune response in older adults
NIH, National Institute of Allergy and Infectious Diseases (NIAID)

A Phase 1 trial of an investigational mRNA vaccine to prevent SARS-CoV-2 infection has shown that the vaccine is well-tolerated and generates a strong immune response in older adults.

Newswise: Tufts University to lead $100M program to reduce risk of zoonotic viral spillover, spread
Released: 30-Sep-2020 2:10 PM EDT
Tufts University to lead $100M program to reduce risk of zoonotic viral spillover, spread
Tufts University

Tufts University will lead a $100 million, five-year program to understand and address threats posed by zoonotic viral diseases that can “spill over” from animals to humans, such as SARS-CoV-2, in an effort to reduce risk of infection, amplification, and spread, USAID announced today.

Newswise: Guiding Communities Through Alerts and Warnings for COVID-19, Other Emergencies
Released: 30-Sep-2020 1:55 PM EDT
Guiding Communities Through Alerts and Warnings for COVID-19, Other Emergencies
Homeland Security's Science And Technology Directorate

DHS S&T has been working with the FEMA IPAWS office and state and local response teams since early 2009 to develop effective alerts, warnings, and notifications programs, as well as identifying gaps in existing IPAWS alerting messaging.

Showing results

110 of 3466