Skip to main content
Article
Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations
bioRxiv
  • Alicia R. Martin, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston
  • Elizabeth G. Atkinson​, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston
  • Sinéad B. Chapman, Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge
  • Anne Stevenson, Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge
  • Rocky E. Stroud​, Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge
  • Tamrat Abebe, Department of Microbiology, Immunology, and Parasitology, School of Medicine, College of Health Sciences,Addis Ababa University, Addis Ababa, Ethiopia
  • Dickens Akena, Department of Psychiatry, School of Medicine, College of Health Sciences, Makerere University, Kampala,Uganda
  • Melkam Alemayehu, Department of Psychiatry, School of Medicine, College of Health Sciences, Addis Ababa University, AddisAbaba, Ethiopia
  • Fred K. Ashaba, Department of Immunology & Molecular Biology, College of Health Sciences, Makerere University, Kampala,Uganda
  • Lukoye Atwoli, Aga Khan University
Publication Date
1-1-2020
Document Type
Article
Abstract

Background: Genetic studies of biomedical phenotypes in underrepresented populations identify disproportionate numbers of novel associations. However, current genomics infrastructure--including most genotyping arrays and sequenced reference panels--best serves populations of European descent. A critical step for facilitating genetic studies in underrepresented populations is to ensure that genetic technologies accurately capture variation in all populations. Here, we quantify the accuracy of low-coverage sequencing in diverse African populations.

Results: We sequenced the whole genomes of 91 individuals to high-coverage (>20X) from the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study, in which participants were recruited from Ethiopia, Kenya, South Africa, and Uganda. We empirically tested two data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole genome sequencing data. We show that low-coverage sequencing at a depth of ≥4X captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1X) performed comparable to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation, with 4X sequencing detecting 45% of singletons and 95% of common variants identified in high-coverage African whole genomes.

Conclusion: These results indicate that low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, including those that capture variation most common in Europeans and Africans. Low-coverage sequencing effectively identifies novel variation (particularly in underrepresented populations), and presents opportunities to enhance variant discovery at a similar cost to traditional approaches.

Comments

This work was published before the author joined Aga Khan University.

Citation Information
Alicia R. Martin, Elizabeth G. Atkinson​, Sinéad B. Chapman, Anne Stevenson, et al.. "Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations" bioRxiv (2020) p. 1 - 16
Available at: http://works.bepress.com/lukoye_atwoli/22/