Skip to main content
Article
Identifying Character Non-Independence in Phylogenetic Data using Data Mining Techniques
Proceedings of the 2nd Asia-Pacific Bioinformatics Conference (2004: Jan. 18-22, Dunedin, New Zealand)
  • Jennifer Leopold, Missouri University of Science and Technology
  • Anne M. Maglia, Missouri University of Science and Technology
  • M. Thakur
  • B. Patel
  • Fikret Erçal, Missouri University of Science and Technology
Abstract

Undiscovered relationships in a data set may confound analyses, particularly those that assume data independence. Such problems occur when characters used for phylogenetic analyses are not independent of one another. A main assumption of phylogenetic inference methods such as maximum likelihood and parsimony is that each character serves as an independent hypothesis of evolution. When this assumption is violated, the resulting phylogeny may not reflect true evolutionary history. Therefore, it is imperative that character non-independence be identified prior to phylogenetic analyses. To identify dependencies between phylogenetic characters, we applied three data mining techniques: 1) Bayesian networks, 2) decision tree induction, and 3) rule induction from coverings. We briefly discuss the main ideas behind each strategy, show how each technique performs on a small sample data set, and apply each method to an existing phylogenetic data set. We discuss the interestingness of the results of each method, and show that, although each method has its own strengths and weaknesses, rule induction from coverings presents the most useful solution for determining dependencies among phylogenetic data at this time.

Meeting Name
2nd Asia-Pacific Bioinformatics Conference, APBC2004 (2004: Jan. 18-22, Dunedin, New Zealand)
Department(s)
Computer Science
Keywords and Phrases
  • Character Independence,
  • Data Mining,
  • Machine Learning,
  • Phylogenetic Data
Document Type
Article - Conference proceedings
Document Version
Final Version
File Type
text
Language(s)
English
Rights
© 2004 Association for Computing Machinery (ACM), All rights reserved.
Publication Date
1-22-2004
Publication Date
22 Jan 2004
Disciplines
Citation Information
Jennifer Leopold, Anne M. Maglia, M. Thakur, B. Patel, et al.. "Identifying Character Non-Independence in Phylogenetic Data using Data Mining Techniques" Proceedings of the 2nd Asia-Pacific Bioinformatics Conference (2004: Jan. 18-22, Dunedin, New Zealand) (2004)
Available at: http://works.bepress.com/jennifer-leopold/8/