Skip to main content
Article
fagin: synteny-based phylostratigraphy and finer classification of young genes
BMC Bioinformatics
  • Zebulun Arendsee, Iowa State University
  • Jing Li, Iowa State University
  • Urminder Singh, Iowa State University
  • Priyanka Bhandary, Iowa State University
  • Arun Seetharam, Iowa State University
  • Eve Syrkin Wurtele, Iowa State University
Document Type
Article
Publication Version
Published Version
Publication Date
1-1-2019
DOI
10.1186/s12859-019-3023-y
Abstract

Background: With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then extant organisms will contain a patchwork of genes whose ancestors first appeared at different times. Standard phylostratigraphy, the technique of partitioning genes by their age, is based solely on protein similarity algorithms. However, this approach relies on negative evidence ─ a failure to detect a homolog of a query gene. An alternative approach is to limit the search for homologs to syntenic regions. Then, genes can be positively identified as de novo orphans by tracing them to non-coding sequences in related species.

Results: We have developed a synteny-based pipeline in the R framework. Fagin determines the genomic context of each query gene in a focal species compared to homologous sequence in target species. We tested the fagin pipeline on two focal species, Arabidopsis thaliana (plus four target species in Brassicaseae) and Saccharomyces cerevisiae (plus six target species in Saccharomyces). Using microsynteny maps, fagin classified the homology relationship of each query gene against each target genome into three main classes, and further subclasses: AAic (has a coding syntenic homolog), NTic (has a non-coding syntenic homolog), and Unknown (has no detected syntenic homolog). fagin inferred over half the “Unknown” A. thaliana query genes, and about 20% for S. cerevisiae, as lacking a syntenic homolog because of local indels or scrambled synteny.

Conclusions: fagin augments standard phylostratigraphy, and extends synteny-based phylostratigraphy with an automated, customizable, and detailed contextual analysis. By comparing synteny-based phylostrata to standard phylostrata, fagin systematically identifies those orphans and lineage-specific genes that are well-supported to have originated de novo. Analyzing within-species genomes should distinguish orphan genes that may have originated through rapid divergence from de novo orphans. Fagin also delineates whether a gene has no syntenic homolog because of technical or biological reasons. These analyses indicate that some orphans may be associated with regions of high genomic perturbation.

Comments

This article is published as Arendsee, Zebulun, Jing Li, Urminder Singh, Priyanka Bhandary, Arun Seetharam, and Eve Syrkin Wurtele. "fagin: synteny-based phylostratigraphy and finer classification of young genes." BMC bioinformatics 20 (2019): 1-14. doi; 10.1186/s12859-019-3023-y.

Creative Commons License
Creative Commons Attribution 4.0 International
Copyright Owner
The Authors
Language
en
File Format
application/pdf
Citation Information
Zebulun Arendsee, Jing Li, Urminder Singh, Priyanka Bhandary, et al.. "fagin: synteny-based phylostratigraphy and finer classification of young genes" BMC Bioinformatics Vol. 20 (2019) p. 440
Available at: http://works.bepress.com/arun-seetharam/18/