Newswise — Storing data in DNA may seem like science fiction, but it's on the horizon. Professor Tom de Greef foresees the launch of the first DNA data center within the next five to ten years. Instead of using a hard drive to store data as zeros and ones, information will be stored in the base pairs that form DNA: AT and CG. The data center will take the shape of a lab, significantly smaller than those in use today. De Greef can envision it already. In one section, new files will be encoded via DNA synthesis. Another section will hold vast arrays of capsules, each loaded with a file. A robotic arm will extract a capsule, scan its contents, and replace it.

Synthetic DNA is the subject at hand. In the laboratory, bases are linked together in a specific sequence to create artificially generated DNA strands. This synthetic DNA can be used to store files and images that are currently held in data centers. However, at present, this method is only viable for archival purposes. This is because accessing the stored information is very costly, so it is preferable to consult the DNA files as infrequently as possible.

Large, energy-guzzling data centers made obsolete

The advantages of DNA data storage are numerous. One of the benefits is the much more efficient use of space, as a DNA file can be stored much more compactly. Additionally, the data has a much longer lifespan. However, the most significant advantage is that this technology eliminates the need for large, energy-consuming data centers. De Greef emphasizes the urgency of this need, stating that "in three years, we will generate so much data worldwide that we will be unable to store half of it."

In collaboration with Microsoft and a consortium of university partners, Professor De Greef, along with PhD student Bas Bögels, has developed a novel method to scale up the innovation of synthetic DNA data storage. Today, the outcomes of their research have been published in the journal Nature Nanotechnology. De Greef is affiliated with the Department of Biomedical Engineering and the Institute for Complex Molecular Systems (ICMS) at TU Eindhoven, where he serves as a visiting professor at Radboud University.

Scalable

The concept of utilizing DNA strands as a means of storing data was first introduced in the 1980s, but proved to be exceedingly difficult and expensive at the time. It was not until three decades later when DNA synthesis began to gain traction that it became a viable option. Geneticist George Church from Harvard Medical School expanded upon the idea in 2011. Since then, the cost of both synthesis and data retrieval has decreased exponentially, resulting in the technology finally being made available on the market.

In recent times, De Greef and his team have been primarily focused on the challenge of retrieving stored data, which remains the biggest obstacle to the success of this new technology. Currently, the method used to access the data, known as 'random access' PCR, is prone to significant errors. Consequently, only one file can be read at a time, and the quality of the data deteriorates substantially with each read. This makes the process far from scalable.

The process of Polymerase Chain Reaction (PCR) involves the creation of numerous copies of a specific piece of DNA by introducing a primer with the desired DNA code. This technique is used in laboratory COVID-19 tests, where even the tiniest amount of coronavirus material from a nasal swab can be detected by copying it multiple times. However, if multiple files need to be read simultaneously, multiple primer pairs are required to function concurrently. This often results in numerous errors during the copying process.

Every capsule contains one file

The microcapsules, developed by De Greef's team, serve a crucial role in addressing the aforementioned challenge. Composed of proteins and a polymer, each capsule is anchored with a single file. These capsules have unique thermal properties that can be leveraged for efficient PCR. At temperatures exceeding 50 degrees Celsius, the capsules automatically seal themselves, allowing for independent PCR processes to occur within each capsule, significantly reducing the risk of errors. De Greef refers to this method as 'thermo-confined PCR'. In laboratory tests, it has demonstrated the ability to read 25 files concurrently without any significant errors.

Subsequently, when the temperature is reduced, the copies detach from the capsule, leaving the original file anchored within. This ensures that the quality of the original file is not compromised in any way. According to De Greef, the current method has a loss of only 0.3 percent after three reads, compared to 35 percent using the existing technique.

Searchable with fluorescence

Furthermore, De Greef has simplified the process of searching for data within the library. By assigning a fluorescent label to each file and a unique color to each capsule, the colors can be recognized and distinguished from one another using a device. This means that the desired file can be easily selected from the pool of capsules, just as the robotic arm was envisioned to do in the beginning of this story.

With the solution to reading the data, the next challenge for De Greef is to wait for the costs of DNA synthesis to decrease further so that the technique becomes more affordable. He is optimistic that once this happens, the technique will be ready for widespread application. De Greef envisions the Netherlands opening the world's first DNA data center, which would be a significant milestone in the field of data storage.

Journal Link: Nature Nanotechnology