Lawrence Livermore National Laboratory

Machine learning model for COVID-19 drug discovery is a Gordon Bell finalist

Newswise — A machine learning model developed by a team of Lawrence Livermore National Laboratory (LLNL) scientists to aid in COVID-19 drug discovery efforts is a finalist for the Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research.

Using all of Sierra, the world’s third fastest supercomputer, LLNL scientists produced a more accurate and efficient generative model to enable COVID-19 researchers to produce novel compounds that could possibly treat the disease. The team trained the model on an unprecedented 1.6 billion small molecule compounds and one million additional promising compounds for COVID-19, reducing the model training time from one day to just 23 minutes.

 “This capability will have a dramatic impact on drug discovery,” said paper co-author and LLNL computer scientist Ian Karlin. “This ability to quickly create high-quality machine learning models changes the time-to-insight from a compute-limited issue to a human-limited one.”

Since the early days of the pandemic, LLNL scientists have been using machine learning to discover countermeasures capable of binding to protein sites in the SARS-CoV-2 virus that causes COVID-19. Lab researchers plan to incorporate the improved generative model into the small molecule drug design loop to create more diverse and potentially more effective drug compounds to synthesize for experimental testing, a critical factor in the race to find new COVID-19 therapeutics.

“The goal of this project is to generate new molecules within a large space based on promising ones from the docking, binding and molecular dynamics work, but make them a little different so COVID-19 researchers can optimize their designs,” said paper co-author Felice Lightstone, who heads the COVID-19 small molecule work.

New for this year, the special Gordon Bell Prize for COVID-19 Research will be announced on Nov. 19 at the virtual 2020 Supercomputing Conference (SC20). Awarded by the Association for Computing Machinery, the prize recognizes the contributions of HPC and parallel computing to the understanding of the COVID-19 pandemic. The four finalists were selected based on performance and innovation in their computational methods, as well their contributions towards understanding the nature, spread and/or treatment of the disease. The winning team will receive a $10,000 award.

LLNL scientists said their multi-level parallel training approach performed well across multiple scales, including the entire 125 petaflop IBM/NVIDIA supercomputer Sierra with up to 97.7 percent efficiency. Using the Livermore Big Artificial Neural Network Toolkit (LBANN), which enables deep learning research at previously unobtainable scales, the team trained a novel Wasserstein autoencoder on a 1.613 billion molecule training set and a 1.01 million molecule test set, nearly an order of magnitude more chemical compounds than any other work reported to date.

“We are taking the decades of experience that the national labs have and bringing it to bear to enable both combinations of strong and weak scaling for this kind of machine learning problem,” said principal investigator Brian Van Essen. “This has the potential to help transform drug discovery into a more computationally driven process.”

Utilizing mixed-precision training, the team was able to achieve 17.1 percent of half-precision machine peak using tensor cores. While scaling the model to all of Sierra was a “significant challenge,” with even modest computing resources, scientists said they could train or retrain dozens of new models in under an hour, even as new compounds are generated, to create an automated “self-learning design loop” to accelerate drug discovery.

As researchers find more promising compounds for COVID-19 and new pathogens emerge, researchers will need to retrain the model for new protein targets and refine the chemical search. With the number of drug-like molecules estimated at 1060, the ability to rapidly train and retrain machine learning models at such a large scale is “incredibly revolutionary” for drug discovery, Van Essen said.

Using the model, COVID-19 researchers expect to be able to select promising compounds, project them into the model’s latent space and optimize their chemical properties to create similar, but novel compounds that can be further evaluated through experimental testing.

“A generative model supports efficient exploration of novel parts of makeable chemical space and should improve our chances of finding small molecules to act as a countermeasure for a novel pathogen,” said computer scientist Jonathan Allen. “We’re looking at being able to propose new compounds that are computationally predicted to meet multiple pharmacologically based design criteria in a single evaluation step. This contrasts with the traditional approach of serially optimizing and experimentally testing each property separately, which greatly lengthens drug discovery and development times.”

To train the model, LLNL computer scientist Sam Ade Jacobs and colleagues designed and implemented a character-based Wasserstein autoencoder (cWAE). In contrast to the state-of-the-art (baseline) molecular generative models like variational autoencoders (VAEs) and junction-tree variational autoencoders (JTVAEs), cWAE enforces a stronger constraint during training. These constraints lead to better continuous latent space, and therefore better reconstruction and sampling of molecules from latent space.

“For targeted drug design for COVID-19, you need a generative model that preserves reconstruction while increasing diversity of variations in the latent space,” Jacobs said. “We found the Wasserstein autoencoder is best suited for this task.”

Besides increased speed, researchers demonstrated the approach was more accurate than the variational autoencoder, providing more robust compound reconstructions and an order of magnitude improvement in Average Tanimoto Distance, a metric describing the similarity between the compounds run through the model, and their original input compounds.

Researchers said while the results are promising, they want to improve the scaling and train using more types of models. Their next step is to incorporate the fully trained model into the drug design loop so COVID-19 scientists can use it to evaluate a more diverse set of compounds, predict more valid chemicals, and exert more control over specialization of new compounds.

The team also wants to make the model more automated and improve the efficiency of the overall general drug discovery loop through the Accelerating Therapeutic Opportunities in Medicine (ATOM) consortium, which will help improve rapid response to future viruses. Scientists also will look at training on other architectures and use higher order optimization methods to dramatically improve the model’s quality and speed.

The model is being promoted to other Department of Energy (DOE) national laboratories and is being investigated by the Exascale Computing Project’s (ECP) ExaLearn, a co-design center for Exascale Machine Learning Technologies, and the CANDLE (Cancer Distributed Learning Environment) project led by DOE, ECP and the National Cancer Institute. It has also drawn interest from industry.

Active funding for the research was provided by ExaLearn and ATOM projects. Prior funding for core investments in LBANN includes the Laboratory Directed Research and Development program as well as the ECP CANDLE project. The LLNL Advanced Simulation & Computing program provided funding for ATOM, and staff support and computer time for the effort. Co-authors include Tim Moon, Kevin McLoughlin, Derek Jones, David Hysom, Dong H. Ahn, John Gyllenhaal and Pythagoras Watson.

Van Essen will be presenting the paper during SC20 on Nov. 19 at 10 a.m. PST.

For more on the Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research, visit

Enabling Rapid COVID-19 Small Molecule Drug Design Through Scalable Deep Learning of Generative Models

For more on LLNL’s COVID-19 research, visit

Filters close

Showing results

110 of 4147
Newswise: COVID-19 Peer Hub combats vaccine avoidance amid pandemic
Released: 27-Nov-2020 8:05 AM EST
COVID-19 Peer Hub combats vaccine avoidance amid pandemic
University of South Australia

UniSA researchers are evaluating a new vaccination education initiative – the COVID-19 Peer Hub – to help immunisation and public health professionals to tackle the emerging dangers of vaccine hesitancy amid the pandemic.

Newswise: Long-Term Impacts of COVID-19: Your Mental Health
Released: 25-Nov-2020 2:15 PM EST
Long-Term Impacts of COVID-19: Your Mental Health

The COVID-19 pandemic has shaped more than half a year of our lives, canceling plans, upending livelihoods and causing feelings of grief, stress and anxiety. And Cedars-Sinai mental health experts say the pandemic could be shaping our mental health well into the future.

Released: 25-Nov-2020 12:45 PM EST
SARS-CoV-2 mutations do not appear to increase transmissibility
University College London

None of the mutations currently documented in the SARS-CoV-2 virus appear to increase its transmissibility in humans, according to a study led by UCL researchers.

Newswise: COVID-19 vaccine candidate tested preclinically at UAB nears first clinical test in people
Released: 25-Nov-2020 11:05 AM EST
COVID-19 vaccine candidate tested preclinically at UAB nears first clinical test in people
University of Alabama at Birmingham

Maryland-based Altimmune Inc., a clinical stage biopharmaceutical company, has submitted an Investigational New Drug, or IND, application to the United States Food and Drug Administration to commence a Phase 1 clinical study of its single-dose intranasal COVID-19 vaccine candidate, AdCOVID.

Released: 25-Nov-2020 11:05 AM EST
BIDMC researchers reveal how genetic variations are linked to COVID-19 disease severity
Beth Israel Lahey Health

New research BIDMC-led sheds light on the genetic risk factors that make individuals more or less susceptible to severe COVID-19.

Newswise: blog-pandemic-scenario-planning-lg-feature2.jpg
Released: 25-Nov-2020 11:05 AM EST
Pandemic Ups Game on Scenario Planning in The Arts
Wallace Foundation

Researcher/Author of new toolkit and report seeks to help arts and culture organizations add scenario planning to their strategic toolbox

Released: 25-Nov-2020 10:30 AM EST
Young people's anxiety levels doubled during first COVID-19 lockdown, says study
University of Bristol

The number of young people with anxiety doubled from 13 per cent to 24 per cent, during the early stages of the COVID-19 pandemic and lockdown 1, according to new research from the University of Bristol.

Newswise: 249837_web.jpg
Released: 25-Nov-2020 10:20 AM EST
Tracking COVID-19 trends in hard-hit states
Louisiana State University

Currently, there are over 10 million confirmed cases and more than 240,000 casualties attributed to COVID-19 in the U.S.

Released: 25-Nov-2020 9:55 AM EST
More Health Systems Join National #MaskUp Campaign
Cleveland Clinic

Many more health systems are joining the national #MaskUp campaign encouraging Americans to stop the spread of COVID-19 by following safety guidelines. Over just a few days, another 19 health systems with hundreds of hospitals united with 100 health systems nationwide with hospitals numbering in the thousands. The public service campaign is critical to the health and well-being of all Americans. It is a plea from healthcare professionals everywhere: wear a mask and follow other precautions to save lives and help get our country back on its feet.

Newswise: delaterre_jpeg.jpg
Released: 25-Nov-2020 7:35 AM EST
Warwick scientists design model to predict cellular drug targets against Covid-19
University of Warwick

The covid-19 virus, like all viruses relies on their host for reproduction

Showing results

110 of 4147