Skip to main content
Dissertation
Phylogenomics: Molecular Evolution in the Genomics Era
(2012)
  • Arun S. Seetharam, Indiana State University
Abstract
Evolutionary studies in recent years have been transformed by the development of new,
powerful techniques for investigating many mechanisms and events of molecular evolution.
Large collections of many different complete genomes now available in the public domain offer
great advantages to genomic scale evolutionary studies. Phylogenomics, a term often used to
describe the use of genomic scale data to infer species phylogeny or to predict protein function
through evolutionary history, is greatly benefitted by the revolutionary progress in DNA
sequencing technology. In the present study we developed and utilized various phylogenomic
methods on large genome-scale data.

In the first study, we applied Singular Value Decomposition (SVD) analysis to reexamine
current evolutionary relationships for 12 Drosophila species using the predicted
proteins from whole genomes. An SVD analysis on unfiltered whole genomes (193,622
predicted proteins) produced the currently accepted Drosophila phylogeny at higher dimensions,
except for the generally accepted, but difficult to discern, sister relationship between D. erecta
and D. yakuba. Also, in accordance with previous studies, many sequences appear to support
alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when
approximately 55% to 95% of the proteins were removed using a filter based on projection
values or by reducing resolution by using fewer dimensions.

In the second study, we simulated restriction enzyme digestions on 21 sequenced
genomes of various Drosophila species. Using the fragments generated by simulated digestion
iv
from the predicted targets of 16 Type IIB restriction enzymes, we sampled a large and effectively
arbitrary selection of loci from these genomes. The resulting fragments were then used to
compare organisms and to calculate the distance between genomes in pair-wise combination by
counting the number of shared fragments between the two genomes. Phylogenetic trees were
then generated for each enzyme using this distance measure, and the consensus was calculated.
The consensus tree obtained agrees well with the currently accepted tree for these Drosophila
species. We conclude that multi-locus sub-genomic representation combined with next
generation sequencing, especially for individuals and species without previous genome characterization, can improve studies of comparative genomics and the building of accurate
phylogenetic trees.

The third study utilized the relatively new Daphnia genome in an attempt to identify 40
orthologous groups of C2H2 Zinc-finger proteins that were previously determined to be well
conserved in bilaterians. We identified 58 C2H2 ZFP genes in Daphnia that belong to these 40
distinct families. The Daphnia genome appears to be relatively efficient with respect to these
well-conserved C2H2 ZFP, since only 7 of the 40 gene families have more than one identified
member. Worms have a comparable number of 6. In flies and humans, C2H2 ZFP gene
expansions are more common, since these organisms display 15 and 24 multi-member families
respectively. In contrast, only three of the well-conserved C2H2 ZFP families have expanded in
Daphnia relative to Drosophila, and in two of these cases, just one additional gene was found.
The KLF/SP family in Daphnia, however, is significantly larger than that of Drosophila, and
many of the additional members found in Daphnia appear to correspond to KLF 1/2/4 homologs,
which are absent in Drosophila, but present in vertebrates.

Keywords
  • Phylogenomics,
  • Singular value decomposition,
  • C2H2 zinc fingers
Publication Date
August, 2012
Degree
Doctor of Philosophy
Department
Biology
Advisors
Gary W. Stuart, Ph.D., Jennifer K. Inlow, Ph.D., Allan R. Albig, Ph.D., Swapan K. Ghosh, Ph.D., James P. Hughes, Ph.D.
Comments
Copyright Arun S. Seetharam 2012
Citation Information
Arun S. Seetharam. "Phylogenomics: Molecular Evolution in the Genomics Era" (2012)
Available at: http://works.bepress.com/arun-seetharam/6/