Newswise — BOSTON – Data from a new study presented this week at The Liver Meeting® – held by the American Association for the Study of Liver Diseases – found that a machine-learning tool could successfully predict the risk of having non-alcoholic steatohepatitis (NASH) among patients with co-existing diseases. 

NASH, the advanced form of non-alcoholic fatty liver disease (NAFLD), is often underdiagnosed, making the identification and validation of an accurate screening tool highly valuable in clinical practice.

Researchers at Novartis Pharma AG, Basel, Switzerland; ZS Associates in New Jersey and the University Medical Center Mainz in Germany conducted the study with an objective to develop and validate a machine learning algorithm to predict the risk of having NASH using non-invasive, regularly collected clinical parameters available in two real-world databases. The study’s co-authors conducted exploratory analysis, feature extraction, model training and parameter tuning on the NAFLD Adult Database of the National Institute of Diabetes, Digestive and Kidney Diseases (NIDDK) of the National Institutes of Health, which includes data on people with confirmed NASH and non-NASH NAFLD. 

The researchers tested the best-performing model from the NIDDK database on the Optum de-identified Electronic Health Record (EHR) dataset. Performance measures such as area under the curve (AUC), diagnosis sensitivity, specificity and overall accuracy were analyzed. This was an extreme gradient-boosting model, XGBoost, with an AUC of 0.82, which included 14 clinical variables. The AUC of the model was 0.76 in the Optum EHR database. 

Using the XGBoost 14-parameter model on patients in the Optum EHR, the researchers were able to predict up to 29,000 additional, non-identified NASH patients per 100,000 people in the cohort. The researchers also developed a simplified model with five clinical variables which showed a slightly lower performance, with an AUC of 0.80 in the NIDDK and 0.74 in Optum EHR on the same patient cohort. 

“This is an innovative machine learning algorithm developed to help identify potential NASH patients in large datasets based on various clinical variables,” says Jörn M. Schattenberg, MD, head of metabolic liver disease and the translational hepatology research laboratory, Department of Gastroenterology and Hepatology, Johannes Gutenberg-Universitat Mainz. “The algorithm could be used to support earlier screening and management of potential NASH patients, as well as support recruitment of future clinical trials. Future development options, such as the integration into medical records software, could also be considered.”

Dr. Schattenberg will present these findings at AASLD’s press conference in Room 210 at the Hynes Convention Center in Boston on Saturday, Nov. 9 from 4 – 5:30 PM. The study entitled “AN INNOVATIVE TOOL BASED ON MACHINE LEARNING TECHNIQUES PREDICTS NASH PATIENTS IN REAL-WORLD SETTINGS,” will be presented on Monday, Nov. 11 at 10:30 AM in the Constitution Ballroom. The corresponding abstract (number 0190) can be found in the journal, HEPATOLOGY.

About the AASLD 

AASLD is the leading organization of clinicians and researchers committed to preventing and curing liver disease. The work of our members has laid the foundation for the development of drugs used to treat patients with viral hepatitis. Access to care and support of liver disease research are at the center of AASLD’s advocacy efforts.

Press releases and additional information about AASLD are available online at


Authors: J Huang1, M Doherty1, S Regnier2, G Capkun2, M Balp2, Q Ye1, N Janssens2, P Lopez2, M Pedrosa2 and Dr. Jörn Schattenberg3, (1)Zs, Princeton, New Jersey, USA, (2)Novartis Pharma AG, Basel, Switzerland, (3)Department of Medicine, University Medical Center Mainz, Germany

Abstract Text:


Non-alcoholic steatohepatitis (NASH), the advanced form of non-alcoholic fatty liver disease (NAFLD), is underdiagnosed. The objective of this study was to develop and validate a machine learning model for early prediction of NASH patients using non-invasive clinical parameters available in real-world data.


A machine learning approach was used on two real-world databases to develop predictive models for identification of NASH patients. First, exploratory analysis, feature extraction, model training, and parameter tuning were conducted on the NAFLD Adult Database from the National Institute of Diabetes, Digestive and Kidney Diseases (NIDDK). This dataset has ~450 confirmed NASH and ~250 confirmed non-NASH, NAFLD patients. The best-performing model from NIDDK was then tested on the Optum Humedica electronic medical record (EMR) database and selected based on the area under the curve (AUC). Additional performance measures such as diagnosis sensitivity, specificity and overall accuracy were also analyzed to understand model performance. The Optum cohort has 3M patients who have the required data points and met inclusion criteria based on comorbidities and lab values. This cohort includes 23,000 NASH patients diagnosed with an ICD code over 10 years of data. Among them, data from 1,016 patients with NASH confirmed by liver biopsy were used to evaluate model performance.


A gradient boosting model (XGBoost) was the best performing model with an AUC: 0.82 in NIDDK. The model included 14 variables, ranked by variable importance: HbA1c, AST, ALT, total protein, AST/ALT, BMI, triglycerides, height, platelets, WBC, hematocrit, albumin, hypertension, gender. The AUC performance was 0.76 in Optum EMR. A simplified model with five variables (HbA1c, AST, ALT, triglycerides, total protein) showed a slightly lower performance (AUC: 0.80 in NIDDK, 0.74 in Optum Humedica) on the same patient cohort. Using the XGBoost model on patients in Optum EMR, we predicted up to 29,000 additional unidentified NASH patients per 100,000 in our cohort.


This model may be utilized within existing EMRs as an effective and scalable pre-screening support tool for referring patients at risk of NASH to specialists. Further clinical-practice validation is planned to evaluate the value of the tool in clinical practice and for clinical trial recruitment.