Title: How to Enhance DNA Storage Capacity with A New Encoding Scheme
Abstract: Deoxyribonucleic Acid (DNA), with its ultra-high storage density and long durability, is a promising long-term archival storage medium and is attracting much attention today. A DNA storage system encodes and stores digital data with synthetic DNA sequences and decodes DNA sequences back to digital data via sequencing. Several studies have verified the feasibility of using DNA for archival storage with limited amounts of data. Since then, many encoding schemes have been proposed to enlarge DNA storage capacity by increasing DNA encoding density under certain bio-constraints. However, only increasing encoding density is insufficient because enhancing DNA storage capacity is a multifaceted problem. We assume that random accesses are necessary for practical DNA archival storage. We first identify major factors that affecting DNA tube storage capacity under the current technologies. We then investigated the practical DNA tube capacity with several popular encoding schemes. We find that the collisions between primers and DNA payload sequences severely limit the DNA tube capacity. Based on this discovery, we designed a new encoding scheme called Collision Reduction Code (CRC) to trade some encoding density for the reduction of primer-payload collisions. Compared with the best result among the five existing encoding schemes, CRC can extricate 132% more primers from collisions (i.e., usable primers) and increase the DNA tube capacity from 215.42 GB to 321.88 GB. Besides, we will discuss the challenges of studying DNA storage.
Bio: 1:10 David H.C. Du – received his B.S. from National Tsing-Hua University in 1974 and the M.S. and Ph.D. degrees in computer science from the University of Washington, Seattle, in 1980 and 1981, respectively. He is currently the Qwest Chair Professor at the Computer Science and Engineering Department, University of Minnesota, Minneapolis and was the Director of NSF I/UCRC Center Research in Intelligent Storage from 2009 to 2021. He is an IEEE Fellow and a Fellow of Minnesota Supercomputing Institute. He has done research in cyber security, sensor networks, multimedia computing, storage systems, high-speed networking, high-performance computing, and database design and CAD for VLSI circuits. His current research focuses on storage technologies/systems and vehicular networks. He has authored and co-authored more than 350 technical papers, including 150 referred journal publications. He has also graduated 67 Ph.D. and 100+ M.S. students in the past.