Skip to main content
Article
Toward a containerized pipeline for longitudinal analysis of open-source software projects
Graduate Research Symposium
  • Allan Miller
  • George Thiruvathukal, Loyola University Chicago
  • Konstantin Laufer, Loyola University Chicago
  • Emmanuel Amobi
  • Sean Higgins
  • Linette Maliakal
  • Emily Meister
  • Jean-Luc Putter
  • Alex Rose
  • Nicholas Synovic
  • Sophie Von Hatten
  • Jonathan Warkentin
  • Martin Zugschwert
Submission Type
Poster
Degree Type
Masters
Discipline
Sciences
Department
Computer Sciences
Access Type
Open Access
Abstract or Description

Trust in open-source software is a cornerstone of scientific progress and a foundation of high-quality public services. Just as standards are integral when judging the efficacy of a novel pharmaceutical compound or determining the spread of a new disease, the software used to make those determinations should be useful, error-free, reliable, performant, and secure. A small bug in an application, library, or framework can lead to economic loss and even loss of life. We rely on software developers to be dynamic and responsive to user review and bug-reporting. Our team developed an open-source modular pipeline to perform empirical investigations of software quality. A key innovation of our approach is to look at projects “from a distance” similar to methods used in climate, e.g. satellite images being used to observe environmental impacts in air quality/rain forests. Instead of looking at language-specific source code features, our pipeline uses a language-agnostic high-level approach to track software quality by focusing on the development process itself, which yields great insight into the processes programmers use to write and maintain their software. Our distributed modular approach to analytics allows the pipeline to be easily extended to support additional metrics in future work. We store extracted data in an embedded SQLite database, which means that analysis can proceed without complex server setup, let alone hosting the software on dedicated servers. Our analytical modules are designed for efficiency, and future runs of our software only collect missing data, supporting the incremental analysis of known, important open-source projects.

Creative Commons License
Creative Commons Attribution-Noncommercial-No Derivative Works 3.0
Citation Information
Allan Miller, George Thiruvathukal, Konstantin Laufer, Emmanuel Amobi, et al.. "Toward a containerized pipeline for longitudinal analysis of open-source software projects"
Available at: http://works.bepress.com/gkthiruvathukal/201/