A Polyglot Approach to Bioinformatics Data Integration: Phylogenetic Analysis of HIV-12nd Greater Chicago Area System Research Workshop
AbstractRNA-interference has potential therapeutic use against HIV-1 by targeting highly-functional mRNA sequences that contribute to the virulence of the virus. Empirical work has shown that within cell lines, all of the HIV-1 genes are affected by RNAi-induced gene silencing. While promising, inherent in this treatment is the fact that RNAi sequences must be highly specific. HIV, however, mutates rapidly, leading to the evolution of viral escape mutants. In fact, such strains are under strong selection to include mutations within the targeted region, evading the RNAi therapy and thus increasing the virus’ fitness in the host. Taking a phylogenetic approach, we have examined 4000+ HIV-1 strains obtained from NCBI’S database for each of the HIV genes, identifying conserved regions at each hypothetical and operational taxonomical unit within the tree. Integrating the wealth of information available from each genome’s record, we are able to observe how conserved regions vary with respect to their distribution throughout the world. This was made possible through the development of a new software tool, developed such that similar analyses can be conducted for any species or gene of interest, not just HIV-1. In addition to the phylogenetic signal which we can recognize from the HIV-1 genomes examined, we can also identify how selection varies across the genome. Taking this evolutionary approach, we have detected regions ideal for targeting by RNAi treatment. The software system mentioned above provides access to the National Center for Biotechnology Information's (NCBI) GenBank in multiple ways: It converts GenBank data to the FASTA format for for analysis using desktop tools, and it exposes the data in the form of a RESTful web service. We have implemented this system using polyglot approach involving multiple languages (Python and Scala), libraries (Flask and BioJavaX), and persistence mechanisms (text files and MongoDB NoSQL databases).
Creative Commons LicenseCreative Commons Attribution-Noncommercial-No Derivative Works 3.0
Citation InformationS. Reisman, C. Putonti, G. K. Thiruvathukal, and K. Läufer. A Polyglot Approach to Bioinformatics Data Integration: Phylogenetic Analysis of HIV-1: Research Poster. 2nd Greater Chicago Area System Research Workshop (GCASR), May 3, 2013, Evanston, IL, USA.