Skip to main content
Article
Boa: Ultra-large-scale software repository and source-code mining
ACM Transactions on Software Engineering and Methodology (TOSEM)
  • Robert Dyer, Bowling Green State University
  • Hoan Anh Nguyen, Iowa State University
  • Hridesh Rajan, Iowa State University
  • Tien N. Nguyen, Iowa State University
Document Type
Article
Publication Version
Accepted Manuscript
Publication Date
1-1-2015
DOI
10.1145/2803171
Abstract
In today’s software-centric world, ultra-large-scale software repositories, e.g. SourceForge, GitHub, and Google Code, are the new library of Alexandria. They contain an enormous corpus of software and related information. Scientists and engineers alike are interested in analyzing this wealth of information. However, systematic extraction and analysis of relevant data from these repositories for testing hypotheses is hard, and best left for mining software repository (MSR) experts! Specifically, mining source code yields significant insights into software development artifacts and processes. Unfortunately, mining source code at a large-scale remains a difficult task. Previous approaches had to either limit the s cope of the projects studied, limit the scope of the mining task to be more coarse-grained, or sacrifice studying the history of the code. In this paper we address mining source code: a) at a very large scale; b) at a fine-grained level of detail; and c) with full history information. To address these challenges, we present domain-specific language features for source code mining in our language and infrastructure called Boa. The goal of Boa is to ease testing MSR-related hypotheses. Our evaluation demonstrates that Boa substantially reduces programming efforts, thus lowering the barrier to entry. We also show drastic improvements in scalability
Comments

This article is published as Dyer, Robert, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. "Boa: Ultra-large-scale software repository and source-code mining." ACM Transactions on Software Engineering and Methodology (TOSEM) 25, no. 1 (2015): 7. doi:10.1145/2803171. Posted with permission.

Rights
© ACM, 2015 This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Software Engineering and Methodology (TOSEM) 25, no. 1 (2015). https://doi.org/10.1145/2803171
Copyright Owner
ACM
Language
en
File Format
application/pdf
Citation Information
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan and Tien N. Nguyen. "Boa: Ultra-large-scale software repository and source-code mining" ACM Transactions on Software Engineering and Methodology (TOSEM) Vol. 25 Iss. 1 (2015) p. Article 7
Available at: http://works.bepress.com/hridesh-rajan/52/