Contemporary DNA sequencing technologies are continuously increasing throughput at ever decreasing costs. Moreover, due to recent advances in sequencing technology new platforms are emerging. As such computational challenges persist. The average read length possible has taken a giant leap forward with the PacBio and Nanopore solutions. Regardless of the platform used, impurities within the DNA preparation of the sample - be it from unintentional contaminants or pervasive symbiots - remains an issue. We have developed a new tool, HAsh-MaP-ERadicator (HAMPER), for the detection and removal of non-target, contaminating DNA sequences. Integrating hash-based and mapping-based strategies, HAMPER is both memory and time efficient while maintaining a high level of sensitivity. Moreover, HAMPER was designed for flexibility: reads of any size can be efficiently examined and the user can set parameters specific for the analysis of reads produced by a particular sequencer. To evaluate our method, mock sequencing runs were generated including various contaminating species and with variable rates of mutation revealing a high level of sensitivity and specificity. Reads that are not of interest can quickly be removed using HAMPER thus improving downstream analyses.
© IEEE Conference Publications, 2015.
Available at: http://works.bepress.com/catherine-putonti/24/
Author Posting © IEEE Conference Publications, 2015. This is the author's version of the work. It is posted here by permission of IEEE Conference Publications for personal use, not for redistribution. The definitive version was published in Proceedings of the 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015. http://dx.doi.org/10.1109/BIBM.2015.7359835%20