Newswise — To date, elucidating the origins of roughly half of all uncommon genetic disorders has proven unattainable. A Munich scientific group has formulated an algorithm that forecasts the consequences of genetic mutations on RNA synthesis with sixfold enhanced precision compared to prior models. Consequently, the precise identification of genetic factors responsible for rare hereditary diseases and cancer becomes possible.

Different forms of genetic sequence arise relatively often - typically, one out of every thousand nucleotides in an individual's genome is impacted. Occasionally, these alterations may result in flawed RNAs and consequently ineffective proteins. Consequently, this can give rise to malfunctions within particular bodily organs. In cases where an uncommon ailment is suspected, computer-assisted diagnostic programs can aid in identifying potential genetic origins. More precisely, algorithms can be employed to examine the genome and ascertain whether there exists a correlation between rare genetic variations and dysfunctions in particular body regions.

Interdisciplinary research project

Under the guidance of Julien Gagneur, an esteemed Professor of Computational Molecular Medicine at the Technical University of Munich (TUM) and head of the Computational Molecular Medicine research group at Helmholtz Munich, a cross-disciplinary team comprising members from the Informatics and Medicine departments successfully created an advanced model. This novel model surpasses its predecessors in accurately forecasting the DNA variations that result in the production of faulty RNA.

"Approximately fifty percent of our patients can receive a dependable diagnosis through established DNA analysis techniques," states Dr. Holger Prokisch, one of the study's co-authors and the group leader of the Institute of Human Genetics at TUM and Helmholtz Munich. "However, for the remaining cases, we require models that enhance our predictive capabilities. Our recently devised algorithm has the potential to make a significant contribution in this regard."

Focus of the model is on splicing

During their investigation, the researchers examined genetic variations that impact the conversion of DNA to RNA and subsequently the creation of tissue-specific proteins. Their emphasis was placed on splicing, which refers to the cellular process of cutting RNA in a manner that allows the protein's building instructions to be deciphered at a later stage. When there is genetic variation in the DNA, this process can be disrupted, leading to an excessive or insufficient amount of RNA being excised. Errors in the splicing process are believed to be among the prevalent factors contributing to erroneous protein formation and the development of hereditary diseases.

Significantly greater precision than previous studies

To explore potential connections between genetic variations and splicing dysfunctions in specific tissues, the team made use of pre-existing datasets. These datasets comprised samples of DNA and RNA collected from 946 individuals, encompassing 49 distinct tissue types. By leveraging this comprehensive collection of data, the researchers were able to make informed assessments regarding the associations between genetic variations and splicing abnormalities within specific tissues.

Unlike previous studies, the team took a different approach by initially examining each sample to determine whether and to what degree incorrect splicing caused by DNA variations commonly resulted in splicing dysfunctions within specific tissues. They recognized that certain proteins may have specific relevance to particular regions of the heart, for instance, while serving no function in the brain. By considering these tissue-specific contexts, the researchers aimed to gain insights into the manifestation of splicing dysfunctions caused by DNA variations in various tissues, acknowledging the distinct roles proteins may play in different parts of the body.

"In order to achieve this objective, we constructed a splicing map that was specific to each tissue, quantifying the crucial regions on the RNA that are involved in splicing within that particular tissue," explains Nils Wagner, the study's lead author and a doctoral student at the Chair of Computational Molecular Medicine at TUM. "By adopting this approach, we were able to focus our model on biologically relevant contexts. Our utilization of skin and blood samples allowed us to draw inferences regarding hard-to-access tissues such as the brain or the heart."

During the analysis, the researchers took into account every gene that possessed at least one rare genetic variant and held significance in protein formation. However, it's worth noting that apart from the protein-coding regions on the RNA, there are other sections that play crucial roles in various cellular processes. These non-protein-coding regions were not included in the study's scope. Consequently, the investigation encompassed the examination of approximately 9 million rare genetic variants in total.

"Through the utilization of our newly devised model, we have substantially enhanced the accuracy of predicting erroneous splicing, surpassing previous models by a factor of six," states Prof. Julien Gagneur. "Where previous algorithms achieved a precision of 10 percent at a recall of 20 percent, our model achieves an impressive precision of 60 percent at the same recall level."

Indeed, precision and recall are crucial metrics for assessing the effectiveness of models. Precision measures the proportion of predicted genetic variations that actually result in incorrect splicing. It indicates the model's ability to accurately identify true positive cases. On the other hand, recall quantifies the percentage of genetic variations leading to incorrect splicing that are successfully identified by the model. It reflects the model's ability to capture true positive cases without missing them, also known as false negatives. These metrics provide valuable insights into the performance and reliability of the model in predicting and capturing genetic variations associated with incorrect splicing.

Indeed, precision and recall are crucial metrics for assessing the effectiveness of models. Precision measures the proportion of predicted genetic variations that actually result in incorrect splicing. It indicates the model's ability to accurately identify true positive cases. On the other hand, recall quantifies the percentage of genetic variations leading to incorrect splicing that are successfully identified by the model. It reflects the model's ability to capture true positive cases without missing them, also known as false negatives. These metrics provide valuable insights into the performance and reliability of the model in predicting and capturing genetic variations associated with incorrect splicing.

Practical use of the algorithm

The model developed by the team is being employed as an integral component of the European research project known as "Solve-RD - solving the unsolved rare diseases." This initiative aims to enhance diagnostic outcomes for rare diseases by facilitating extensive knowledge sharing. The researchers from TUM have already conducted analyses on 20,000 DNA sequences obtained from 6,000 families affected by these conditions. By leveraging the model's capabilities, they contribute to the overarching goal of the Solve-RD project, which is to provide improved diagnostics for individuals with rare diseases.

Moreover, the model is anticipated to facilitate the streamlined discovery of genetic diagnoses for diverse types of leukemia in the future. To accomplish this objective, researchers are presently investigating 4,200 DNA and RNA samples obtained from individuals diagnosed with leukemia. By harnessing the capabilities of the model, they seek to simplify the process of identifying the precise genetic determinants underlying different leukemia variants. This ongoing research endeavor holds promise for advancing our comprehension of leukemia genetics and enhancing the efficiency and accuracy of diagnoses in this field.

Further information

Prof. Julien Gagneur joined TUM in 2016 as an assistant professor and later assumed the position of Chair of Computational Molecular Medicine in 2020. His research primarily focuses on investigating the genetic foundation of gene regulation and its implications for diseases, employing statistical algorithms and machine learning techniques. Alongside his role at TUM, Prof. Gagneur serves as a research group leader at Helmholtz Munich. His expertise lies in leveraging computational methods to gain insights into the intricate relationship between genetic mechanisms and various pathological conditions.

In collaboration with Holger Prokisch, a group leader at the Institute of Human Genetics at TUM and Helmholtz Munich, Prof. Julien Gagneur is actively engaged in devising strategies to uncover the underlying causes of genetic disorders. Their joint efforts center around developing innovative approaches and techniques to shed light on the genetic mechanisms that contribute to the emergence of various genetic disorders. Leveraging their expertise in computational molecular medicine and human genetics, they aim to advance our understanding of genetic disorders and pave the way for improved identification and characterization of their root causes.

 

MEDIA CONTACT
Register for reporter access to contact details
CITATIONS

Nature Genetics