Science

Forget your RAID array, here comes another DNA storage breakthrough

This guest post is written by independent business transformation consultant Theo Priestley.

Remember Johnny Mnemonic? It was the William Gibson short story-turned sci-fi film where Keanu Reeves carried large packages of data in his head. Far-fetched?

Not anymore. Boffins in the UK published results this week showing that DNA can be used to archive vast amounts of data. One gram of DNA can hold the equivalent of two petabytes of data. In the report published in Nature.com they encoded a clip of of Martin Luther King’s classic address from 1963, a JPEG photo, a PDF of the famous Crick and Watson paper describing the structure of DNA, a text file containing all of Shakespeare’s sonnets and a file about the new encoding system itself, in total 738kb of data onto DNA, then sequenced and reconstructed the original files with complete accuracy.

According to the report, the theoretical limits of data storage on DNA go way beyond what is physically possible using today’s methods. A DNA-based storage system requires no maintenance, no electricity and no backward compatibility requirements to retrieve the data. As long as there is DNA based life on Earth there will always be a means to read it, says Dr Ewan Birney of the European Bioinformatics Institute who co-authored the report. To copy data to DNA the team translates the binary into a bespoke code which a synthesis machine then resequences as DNA.

It’s not the first time this has been achieved, but the techniques differ from those used by a US team in Boston who encoded an entire book using their methods, the results of which are also published in Science Magazine. DNA is held together by four chemical groups and the UK team’s storage system uses those same four groups or “letters” but encoded in a completely different language to the one understood by the building blocks of life. But where it differs is in how it’s treated as not one long molecule, here the UK team created multiple copies of overlapping DNA fragments, with each fragment also carrying some “indexing” information that identify where in the overall sequence it should sit. According to the report this builds in redundancy into the storage system, meaning that if some DNA fragments become corrupted the information is not lost.

This opens up massive possibilities for enterprise and government data archival using a molecular-based storage method say the scientists.

The downsides?

At the moment, it’s phenomenally expensive. But the report argues that as technology matures, the cost for entry will be lowered. But perhaps the biggest kicker is because the DNA is synthesized it can’t be placed inside a living host as it’ll be rejected and disposed of naturally by the body. So dreams of making a career from carrying storage for large corporations in your fingertip will have to wait for now.

Theo PriestleyTheo Priestley is a consultant, analyst, and advisor. He’s written analysis on the industry and tech space in general since 2007 and has collaborated with and advised the large and the small, from stealth startups to industry established players.