"Evaluation of Variant Identification Methods for Whole Genome Sequencing Data in Dairy Cattle" by Christine F. Baes

Selected Works of James M Reecy

Follow Contact

Article

Evaluation of Variant Identification Methods for Whole Genome Sequencing Data in Dairy Cattle

BMC Genomics

Christine F. Baes, Bern University of Applied Sciences
Marlies A. Dolezal, University of Veterinary Medicine Vienna
James Eugene Koltes, Iowa State University
Beat Bapst, Qualitas AG
Eric R. Fritz-Waters, Iowa State University
Sandra Jansen, Technische Universität München
Christine Flury, Bern University of Applied Sciences
Heidi Signer-Hasler, Bern University of Applied Sciences
Christine Stricker, agn Genetics GmbH
Rohan L Fernando, Iowa State University
Ruedi Fries, Technische Universität München
Juerg Moll, Qualitas AG
Dorian J. Garrick, Iowa State University
James M Reecy, Iowa State University
Birgit Gredler, Qualitas AG

Download Find in your library

Document Type

Article

Disciplines

Publication Version

Published Version

Publication Date

1-1-2014

DOI

10.1186/1471-2164-15-948

Abstract

Advances in human genomics have allowed unprecedented productivity in terms of algorithms, software, and literature available for translating raw next-generation sequence data into high-quality information. The challenges of variant identification in organisms with lower quality reference genomes are less well documented. We explored the consequences of commonly recommended preparatory steps and the effects of single and multi sample variant identification methods using four publicly available software applications (Platypus, HaplotypeCaller, Samtools and UnifiedGenotyper) on whole genome sequence data of 65 key ancestors of Swiss dairy cattle populations. Accuracy of calling next-generation sequence variants was assessed by comparison to the same loci from medium and high-density single nucleotide variant (SNV) arrays. The total number of SNVs identified varied by software and method, with single (multi) sample results ranging from 17.7 to 22.0 (16.9 to 22.0) million variants. Computing time varied considerably between software. Preparatory realignment of insertions and deletions and subsequent base quality score recalibration had only minor effects on the number and quality of SNVs identified by different software, but increased computing time considerably. Average concordance for single (multi) sample results with high-density chip data was 58.3% (87.0%) and average genotype concordance in correctly identified SNVs was 99.2% (99.2%) across software. The average quality of SNVs identified, measured as the ratio of transitions to transversions, was higher using single sample methods than multi sample methods. A consensus approach using results of different software generally provided the highest variant quality in terms of transition/transversion ratio. Our findings serve as a reference for variant identification pipeline development in non-human organisms and help assess the implication of preparatory steps in next-generation sequencing pipelines for organisms with incomplete reference genomes (pipeline code is included). Benchmarking this information should prove particularly useful in processing next-generation sequencing data for use in genome-wide association studies and genomic selection.

Comments

This article is from BMC Genomics 15 (2014): 948, doi:10.1186/1471-2164-15-948. Posted with permission.

Rights

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Christine F. Baes, et al

2014

Language

File Format

application/pdf

Citation Information

Christine F. Baes, Marlies A. Dolezal, James Eugene Koltes, Beat Bapst, et al.. "Evaluation of Variant Identification Methods for Whole Genome Sequencing Data in Dairy Cattle" BMC Genomics Vol. 15 (2014) p. 1 - 18
Available at: http://works.bepress.com/james_reecy/75/