Skip to main content
Article
Optimization of SAMtools sorting using OpenMP tasks
Cluster Computing
  • Nathan T. Weeks, Iowa State University
  • Glenn R. Luecke, Iowa State University
Document Type
Article
Disciplines
Publication Version
Accepted Manuscript
Publication Date
4-26-2017
DOI
10.1007/s10586-017-0874-8
Abstract

SAMtools is a widely-used genomics application for post-processing high-throughput sequence alignment data. Such sequence alignment data are commonly sorted to make downstream analysis more efficient. However, this sorting process itself can be computationally- and I/O-intensive: high-throughput sequence alignment files in the de facto standard binary alignment/map (BAM) format can be many gigabytes in size, and may need to be decompressed before sorting and compressed afterwards. As a result, BAM-file sorting can be a bottleneck in genomics workflows. This paper describes a case study on the performance analysis and optimization of SAMtools for sorting large BAM files. OpenMP task parallelism and memory optimization techniques resulted in a speedup of 5.9X versus the upstream SAMtools 1.3.1 for an internal (in-memory) sort of 24.6 GiB of compressed BAM data (102.6 GiB uncompressed) with 32 processor cores, while a 1.98X speedup was achieved for an external (out-of-core) sort of a 271.4 GiB BAM file.

Comments

This is a manuscript of an article published as Weeks, Nathan T., and Glenn R. Luecke. "Optimization of SAMtools sorting using OpenMP tasks." Cluster Computing (2017): 1-12. The final publication is available at Springer via http://dx.doi.org/10.1007/s10586-017-0874-8.

Copyright Owner
Springer Verlag
Language
en
File Format
application/pdf
Citation Information
Nathan T. Weeks and Glenn R. Luecke. "Optimization of SAMtools sorting using OpenMP tasks" Cluster Computing (2017) p. 1 - 12
Available at: http://works.bepress.com/nathan-weeks/3/