Newswise — A team of UC Davis Health clinicians and data scientists have developed a machine-learning model to better predict which patients are at greater risk of developing a common type of liver cancer, hepatocellular carcinoma (HCC).
The findings of their research — published in the journal Gastro Hep Advances — describe how predictive-learning can aid physicians in providing early HCC risk assessments for patients diagnosed with metabolic dysfunction-associated steatotic liver disease, or MASLD. The pilot technology may be able to give physicians critical information to screen patients more closely and thus offer more personalized care.
“MASLD can lead to HCC, but the disease is quite sneaky, and it’s often unclear which patients face that risk,” said study co-author Aniket Alurwar, clinical informatics specialist at the UC Davis Center for Precision Medicine and Data Sciences. “It doesn’t make sense to biopsy every patient with MASLD, but if we can segment for risk, we can track those people more closely and perhaps catch HCC early.”
Diagnosing a Stealthy Condition
MASLD (formerly called nonalcoholic fatty liver disease or NAFLD), a condition often linked to metabolic diseases such as type 2 diabetes, is the accumulation of fat in the liver. Around 25% of Americans have some form of MASLD, making it one of the most common liver issues.
The data science team worked closely with clinicians, which included first author Souvik Sarkar, assistant professor in Gastroenterology and Hepatology, and Frederick Meyers, senior author and distinguished professor of Internal Medicine, Hematology and Oncology. Meyer is also director of the Center for Precision Medicine and Data Sciences.
The study is one of the first of its kind. Researchers trained machine-learning algorithms, which leveraged large datasets to make verifiable predictions.
They tested nine different open-source algorithms and shortlisted five for further evaluation and model building. They then taught the shortlisted algorithms to run deidentified health data from 1,561 UC Davis Health MASLD patients, 227 of whom eventually developed HCC. Later, these top five algorithms were validated against data from 686 UC San Francisco patients, also through deidentified medical records,), with 176 getting diagnosed with HCC. An algorithm called Gradient Boosted Trees ultimately produced the prediction model with the greatest statistical accuracy, sensitivity and specificity.
The study confirmed that one of the most reliable markers for HCC risk is advanced liver fibrosis or scarring, characterized by high Fibrosis-4 Index (FIB-4) scores. However, the researchers also found four additional risk factors associated with liver function: high cholesterol, hypertension, bilirubin and alkaline phosphatase (ALP), an enzyme that can indicate liver problems. A combination of those risk factors in one model helped predict HCC risk.
AI Shows High Accuracy
The team found there are multiple pathways to HCC, with high FIB-4 being the most obvious. In some cases, patients with low FIB-4 but high cholesterol, bilirubin and hypertension also developed HCC. Under current guidelines, these patients would not receive precautionary care.
“We got 92.12% accuracy when predicting which MASLD patients would develop HCC, which is very good for a pilot model,” Alurwar said. “Patients with low FIB-4 are typically considered low risk and do not get referred for further assessment. By showing which of these ‘low risk’ patients could develop HCC, we can get them referred for liver biopsies or imaging.”
While the team is proud its model, researchers, plan to advance their accuracy by incorporating more precise data, such as clinical notes. In doing so, they will tap another form of AI, called natural language processing, which translates written text into data. The team will also be testing Bedrock, Amazon’s generative AI platform. Eventually, a similar model could be incorporated into electronic health records, or a separate platform, to flag clinicians when MASLD patients face greater HCC risk.
“We believe we can improve the algorithm by incorporating the clinical notes and perhaps other information,” said Alurwar. “Embedding this data should create an even more powerful model that we can then test to see how it performs.”