Newswise — Proteins, which are essential components of every cell, have long been studied for how they have evolved to acquire new functions in the body. Recently, it has been discovered that proteins can emerge seemingly out of nowhere, from previously non-coding regions of the genome where new DNA structures randomly arise. However, this process, known as de novo protein evolution, has not received as much attention as traditional evolutionary processes. In a groundbreaking study, a team of Czech and German researchers, led by biochemist Dr. Klára Hlouchová from the University of Prague and bioinformatics specialist Prof. Dr. Erich Bornberg-Bauer from the University of Münster, have conducted experiments comparing de novo proteins with computer-generated proteins in terms of their stability and solubility. The results have revealed small but significant differences between the two types of proteins, shedding light on the unique characteristics of de novo protein evolution. The findings of this study have been published in the current issue of the Nature Ecology and Evolution journal.

The team conducted a comparative analysis between two types of proteins: 1,800 candidates for de novo proteins found in fruit flies and humans, which are located in non-genic regions of the genome as DNA, and randomly computer-generated proteins. While computer predictions of protein structure using various algorithms showed similar results for both classes of proteins, laboratory experiments revealed subtle differences that were not predicted. Notably, the de novo proteins displayed slightly higher solubility on average, based on their secondary structure, in the lab experiments. "Surprisingly, despite their recent origins, de novo proteins showed better integration into the cell than expected for randomly emerging proteins," said lead author Brennen Heames from Münster. "These findings suggest that natural selection may already be acting during the early stages of de novo protein evolution."

Margaux Aubel, a co-author from the Münster group, highlights the significance of the results for basic research in the field of de novo evolution. The dataset used in the study includes many human de novo proteins that were investigated for their solubility and aggregation propensity. The ability of proteins to aggregate is known to play a role in various diseases, and previous studies have suggested that de novo proteins may also be associated with diseases. Aubel suggests that the findings of this study could contribute to a better understanding of the role of these relatively under-researched de novo proteins in the development of diseases, shedding light on their potential implications in health and disease-related processes.

In the past, de novo evolution has been predominantly studied from a theoretical perspective using large datasets, while experimental studies have focused on individual de novo proteins. Although theoretical studies have compared de novo proteins with randomly generated sequences, such comparisons have not been verified through experiments. The unique aspect of de novo proteins is their relatively young age, as they arise from non-coded DNA regions with little or no evolutionary pressure. This makes them more comparable to randomly generated proteins rather than well-established proteins that have been in existence for a longer time. By conducting experimental comparisons between de novo proteins and randomly generated proteins, this study provides valuable insights into the characteristics and behavior of de novo proteins in a controlled laboratory setting, shedding light on their evolutionary dynamics and functional properties.

The research team utilized computer programs to predict the properties of proteins under investigation. Subsequently, the team produced the proteins in the laboratory for experimental analysis, which involved scrutinizing them using mass spectrometry. In addition, the team conducted further experiments by introducing a protein-degrading enzyme to assess the degradation rate of the proteins and infer their stability. To investigate the solubility of the proteins, the team employed a molecular transport mechanism of the Escherichia coli bacterium as an indicator. The proteins that were found to be soluble were further characterized using next generation DNA sequencing, allowing for a more detailed analysis of their properties. These experimental techniques enabled the team to gather crucial information about the stability and solubility of the de novo proteins compared to randomly generated proteins, providing insights into their functional characteristics and behavior in the lab setting.

Journal Link: Nature Ecology & Evolution