Skip to main content
Article
Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms
BMC BIOINFORMATICS
  • Daniel I. Speiser, University of California Santa Barbara; University of South Carolina
  • M. Sabrina Pankey, University of California Santa Barbara
  • Alexander K. Zaharoff, University of California Santa Barbara
  • Barbara A. Battelle, University of Florida
  • Heather D. Bracken-Grissom, Department of Biological Sciences, Florida International University
  • Jesse W. Breinholt, University of Florida
  • Seth M. Bybee, Brigham Young University
  • Thomas W. Cronin, University of Maryland
  • Anders Garm, University of Copenhagen
  • Annie R. Lindgren, Portland State University
  • Nipam H. Patel, University of California Berkley
  • Megan L. Porter, University of South Dakota
  • Meredith E. Protas, Iowa State University
  • Ajna S. Rivera, University of the Pacific
  • Jeanne M. Serb, Iowa State University
  • Kirk S. Zigler, University of the South
  • Keith A. Crandall, George Washington University; National Museum of Natural History
  • Todd H. Oakley, University of California Santa Barbara
Date of this Version
11-19-2014
Document Type
Article
Abstract

Background: Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. Results: We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository (http://bitbucket.org/osiris_phylogenetics/pia/) and we demonstrate PIA on a publicly-accessible web server (http://galaxy-dev.cnsi.ucsb.edu/pia/). Conclusions: Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.

Comments

© 2014 Speiser et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Citation Information
Speiser et al. :Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms. BMC Bioinformatics 2014 15:350.