Towards Faster Matching Algorithm Using Ternary Tree in the Area of Genome Mapping

0
338

Abstract

In area of precision medicine there is a need to map long sequences of DNA, which are represented as strings of characters or numbers. Most of the computer programs used for genome mapping use suffix-based data structures, but those are much more suitable for mapping of short DNA sequences represented as strings over small alphabets. The most crucial parameters of data structure used for DNA mapping are time to fill the data structure, search time and system resources needed, especially memory, as the amount of data from scanning process can be really large. This article will describe implementation of memory optimized Ternary Search Tree (TST) for indexing of positions of labels obtained by Bionano Genomics DNA imaging device. BNX file parser with alphabet encoding functions is described and performance results from experiments with presented software solution on real data from Bionano Genomics Saphyr device are also included.