Newswise — Artificial intelligence has become an integral part of our everyday existence. Initially, it made its presence known through ChatGPT. However, today we witness AI-generated pizza and beer advertisements infiltrating our lives. Although we cannot rely on AI to be flawless, we have also discovered that we cannot always place complete trust in our own interactions with AI.

Assistant Professor Peter Koo from Cold Spring Harbor Laboratory (CSHL) has made a significant discovery regarding the interpretation of AI predictions by scientists using popular computational tools for DNA analysis. He has identified a common issue of excessive "noise," which refers to extraneous information that hinders accurate analysis. Fortunately, he has devised a solution to address this problem. By adding a few additional lines of code, scientists can now obtain more reliable explanations from deep neural networks, the powerful AI systems. This breakthrough enables researchers to effectively pursue meaningful DNA features, which could potentially unlock groundbreaking advancements in the field of health and medicine. It is crucial to eliminate the noise to ensure that these significant signals are not overshadowed and overlooked.

The troublesome noise in the AI predictions arises from a source akin to digital "dark matter." In the realm of physics and astronomy, dark matter refers to an elusive substance that exerts gravitational influence despite being unseen. Similarly, Assistant Professor Peter Koo and his team have identified a parallel situation in AI training data. They found that the data used to train AI models lacks essential information, resulting in significant gaps in knowledge. Consequently, these blind spots become incorporated into the interpretation of AI predictions regarding DNA function, further exacerbating the problem. The similarity lies in the hidden nature of both dark matter and the missing critical information within the AI training data, which affects the accuracy of the predictions made by AI systems.

Koo explains, "The deep neural network integrates this arbitrary behavior as it learns a function universally. However, DNA exists only within a limited subset of that function space. This discrepancy introduces substantial noise, significantly impacting a diverse range of prevalent AI models."

The existence of digital dark matter stems from scientists adopting computational methods from AI in computer vision. Unlike images that consist of long and continuous pixel data, DNA data is constrained to a combination of four nucleotide letters: A, C, G, T. Consequently, we are providing AI with an input that it lacks the proper means to handle effectively.

Through the implementation of Koo's computational correction, scientists can achieve enhanced precision in interpreting DNA analyses conducted by AI.

Koo elaborates, stating, "As a result of our computational correction, we observe a significant improvement in the clarity and cleanliness of specific sites, while reducing the presence of erroneous noise in other regions. Nucleotides that were previously considered crucial suddenly vanish, indicating their insignificance."

Koo posits that the issue of noise disturbance extends beyond AI-powered DNA analyzers, affecting various computational processes dealing with similar types of data. This affliction is pervasive, reminiscent of the ubiquitous nature of dark matter. Fortunately, Koo's innovative tool offers a means to navigate out of this darkness, providing scientists with a pathway towards clarity and illumination.

Journal Link: Genome Biology