A Review of DNA Data Storage Technologies Based on Biomolecules

Lichao      Zhang; Yuanyuan      Lv; Lei      Xu; Murong      Zhou

Abstract

In the information age, data storage technology has become the key to improving computer systems. Since traditional storage technologies cannot meet the demand for massive storage, new DNA storage technology based on biomolecules attracts much attention. DNA storage refers to the technology that uses artificially synthesized deoxynucleotide chains to store and read all information, such as documents, pictures, and audio. First, data are encoded into binary number strings. Then, the four types of base, A(Adenine), T(Thymine), C(Cytosine), and G(Guanine), are used to encode the corresponding binary numbers so that the data can be used to construct the target DNA molecules in the form of deoxynucleotide chains. Subsequently, the corresponding DNA molecules are artificially synthesized, enabling the data to be stored within them. Compared with traditional storage systems, DNA storage has major advantages, such as high storage density, long duration, as well as low hardware cost, high access parallelism, and strong scalability, which satisfies the demands for big data storage. This manuscript first reviews the origin and development of DNA storage technology, then the storage principles, contents, and methods are introduced. Finally, the development of DNA storage technology is analyzed. From the initial research to the cutting edge of this field and beyond, the advantages, disadvantages, and practical applications of DNA storage technology require continuous exploration.

Keywords: DNA storage, deoxynucleotide chain, base, binary number. DVDs, CDs.

« Previous Next »

Graphical Abstract

[1] 
Jin Y. Quality of service aware medical CT image transmission anti-collision mechanism based on big data autonomous anti-collision control. Curr Bioinform  2019; 14(7): 676-83.
[http://dx.doi.org/10.2174/1574893613666180502111320] 
[2] 
Lin H. Development and application of artificial intelligence methods in biological and medical data. Curr Bioinform  2020; 15(6): 515-6.
[http://dx.doi.org/10.2174/157489361506200610112345] 
[3] 
Zou Q. Editorial: Latest computational techniques for big data era bioinformatics problems. Curr Genomics  2017; 18(4): 305-5.
[http://dx.doi.org/10.2174/138920291804170726143423] [PMID: 29081685] 
[4] 
Zeng X, Song X, Ma T, et al. Repurpose open data to discover therapeutics for COVID-19 using deep learning. J Proteome Res  2020; 19(11): 4624-36.
[http://dx.doi.org/10.1021/acs.jproteome.0c00316] [PMID: 32654489] 
[5] 
Liu X, Hong Z, Liu J, et al. Computational methods for identifying the critical nodes in biological networks. Brief Bioinform  2020; 21(2): 486-97.
[http://dx.doi.org/10.1093/bib/bbz011] [PMID: 30753282] 
[6] 
Zou Q, Li J, Song L, Zeng X, Wang G. Similarity computation strategies in the microRNA-disease network: A survey. Brief Funct Genomics  2016; 15(1): 55-64.
[PMID: 26134276] 
[7] 
Małysiak-Mrozek B, Baron T, Mrozek D. Spark-IDPP: high-throughput and scalable prediction of intrinsically disordered protein regions with Spark clusters on the Cloud. Cluster Comput  2018; (17): 487-508.
[8] 
Mrozek D, Małysiak-Mrozek B, Siążnik A. Search GenBank: Interactive orchestration and ad-hoc choreography of Web services in the exploration of the biomedical resources of the National Center For Biotechnology Information. BMC Bioinformatics  2013; 14: 73.
[http://dx.doi.org/10.1186/1471-2105-14-73] [PMID: 23452691] 
[9] 
Lipton RJJS. DNA solution of hard computational problems. Science  1995; 268(5210): 542-5.
[http://dx.doi.org/10.1126/science.7725098] 
[10] 
Adleman LMJS. Molecular computation of solutions to combinatorial problems  1994; 266(5187): 1021-4.
[http://dx.doi.org/10.1126/science.7973651] 
[11] 
International HapMap Consortium. The international HapMap project. Nature  2003; 426: 789-96.
[12] 
Gao B, Bataller RJG. Alcoholic liver disease: Pathogenesis and new therapeutic targets 2011; 141(5): 1572-85.
[http://dx.doi.org/10.1053/j.gastro.2011.09.002] 
[13] 
Goldman N, Bertone P, Chen S, et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature  2013; 494(7435): 77-80.
[http://dx.doi.org/10.1038/nature11875] [PMID: 23354052] 
[14] 
Mrozek D, Dąbek T, Małysiak-Mrozek B. Scalable extraction of big macromolecular data in azure data lake environment. Molecules  2019; 24(1): 179.
[http://dx.doi.org/10.3390/molecules24010179] [PMID: 30621295] 
[15] 
Mrozek D. A review of Cloud computing technologies for comprehensive microRNA analyses. Comput Biol Chem  2020; 88107365
[http://dx.doi.org/10.1016/j.compbiolchem.2020.107365] [PMID: 32906056] 
[16] 
Yazdi SHT, Yuan Y, Ma J, et al. A rewritable, random-access DNA-based storage system. Sci Rep  2015; 5(1): 1-10.
[17] 
Limbachiya D, Gupta MK, Aggarwal VJICL. Family of constrained codes for archival DNA data storage. IEEE Commun Lett  2018; 22(10): 1972-5.
[http://dx.doi.org/10.1109/LCOMM.2018.2861867] 
[18] 
Song T, Zeng X, Zheng P, Jiang M, Rodriguez-Paton A. A parallel workflow pattern modeling using spiking neural P systems with colored spikes. IEEE Trans Cogn Dev Syst  2018; 17(4): 474-84.
[http://dx.doi.org/10.1109/TNB.2018.2873221] [PMID: 30281471] 
[19] 
Song B. Monodirectional tissue P systems with promoters. IEEE Trans Cybern  2020; 51(1): 438-50.
[http://dx.doi.org/10.1109/TCYB.2020.3003060] [PMID: 32649286] 
[20] 
Chen X, Mario J, Perez-jemenez , et al. Computing with viruses. Theor Comput Sci  2016; 623: 146-59.
[http://dx.doi.org/10.1016/j.tcs.2015.12.006] 
[21] 
Wei L, Zhou C, Chen H, Song J, Su R. ACPred-FL: A sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics  2018; 34(23): 4007-16.
[http://dx.doi.org/10.1093/bioinformatics/bty451] [PMID: 29868903] 
[22] 
Wei L, Hu J, Li F, Song J, Su R, Zou Q. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief Bioinform  2018; 21(1): 106-19.
[http://dx.doi.org/10.1093/bib/bby107] [PMID: 30383239] 
[23] 
Su R, Liu X, Wei L, Zou Q. Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. Methods  2019; 166: 91-102.
[http://dx.doi.org/10.1016/j.ymeth.2019.02.009] [PMID: 30772464] 
[24] 
Wei L, Xing P, Zeng J, Chen J, Su R, Guo F. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif Intell Med  2017; 83: 67-74.
[http://dx.doi.org/10.1016/j.artmed.2017.03.001] [PMID: 28320624] 
[25] 
Wei L, Su R, Wang B, et al. Integration of deep feature representations and handcrafted features to improve the prediction of N-6-methyladenosine sites. Neurocomputing  2019; 324: 3-9.
[http://dx.doi.org/10.1016/j.neucom.2018.04.082] 
[26] 
Li JP, Yuqian T, Jijun JP, Zou Q, Guo F. DeepATT: A hybrid category attention neural network for identifying functional effects of DNA sequences. Brief Bioinform  2020; 22(3)bbaa159
[27] 
Li J, Pu Y, Tang J, Zou Q, Guo F. DeepAVP: A dual-channel deep neural network for identifying variable-length antiviral peptides. IEEE J Biomed Health Inform  2020; 24(10): 3012-9.
[http://dx.doi.org/10.1109/JBHI.2020.2977091] [PMID: 32142462] 
[28] 
Xu H, Zeng W, Zhang D, Zeng X. MOEA/HD: A multiobjective evolutionary algorithm based on hierarchical decomposition. IEEE Trans Cybern  2019; 49(2): 517-26.
[http://dx.doi.org/10.1109/TCYB.2017.2779450] [PMID: 29990272] 
[29] 
Xu H, Zeng W, Zeng X, Yen GG. An evolutionary algorithm based on Minkowski distance for many-objective optimization. IEEE Trans Cybern  2019; 49(11): 3968-79.
[http://dx.doi.org/10.1109/TCYB.2018.2856208] [PMID: 30059330] 
[30] 
Zeng X, Wang W, Chen C, Yen GG. A consensus community-based particle swarm optimization for dynamic community detection. IEEE Trans Cybern  2020; 50(6): 2502-13.
[http://dx.doi.org/10.1109/TCYB.2019.2938895] [PMID: 31545758] 
[31] 
Zhang Z, Guo K, Pan G, Tang J, Guo F. Improvement of phylogenetic method to analyze compositional heterogeneity. BMC Syst Biol  2017; 11(Suppl. 4): 79.
[http://dx.doi.org/10.1186/s12918-017-0453-x] [PMID: 28950863] 
[32] 
Guo F, Wang D, Wang L. Progressive approach for SNP calling and haplotype assembly using single molecular sequencing data. Bioinformatics  2018; 34(12): 2012-8.
[http://dx.doi.org/10.1093/bioinformatics/bty059] [PMID: 29474523] 

Rights & Permissions Print Cite

Article Metrics

66

3

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893616666210813101237	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

A Review of DNA Data Storage Technologies Based on Biomolecules

Abstract

Graphical Abstract

Related Journals

Related Books