Skip to main content
Article
Why and how developers fork what from whom in GitHub
Empirical Software Engineering
  • Jing JIANG, Beijing University of Aeronautics and Astronautics (Beihang University)
  • David LO, Singapore Management University
  • Jiahuan HE, Beijing University of Aeronautics and Astronautics (Beihang University)
  • Xin XIA, Zhejiang University
  • Pavneet Singh KOCHHAR, Singapore Management University
  • Li ZHANG, Beijing University of Aeronautics and Astronautics (Beihang University)
Publication Type
Journal Article
Version
publishedVersion
Publication Date
2-2017
Abstract

Forking is the creation of a new software repository by copying another repository. Though forking is controversial in traditional open source software (OSS) community, it is encouraged and is a built-in feature in GitHub. Developers freely fork repositories, use codes as their own and make changes. A deep understanding of repository forking can provide important insights for OSS community and GitHub. In this paper, we explore why and how developers fork what from whom in GitHub. We collect a dataset containing 236,344 developers and 1,841,324 forks. We make surveys, and analyze programming languages and owners of forked repositories. Our main observations are: (1) Developers fork repositories to submit pull requests, fix bugs, add new features and keep copies etc. Developers find repositories to fork from various sources: search engines, external sites (e.g., Twitter, Reddit), social relationships, etc. More than 42 % of developers that we have surveyed agree that an automated recommendation tool is useful to help them pick repositories to fork, while more than 44.4 % of developers do not value a recommendation tool. Developers care about repository owners when they fork repositories. (2) A repository written in a developer’s preferred programming language is more likely to be forked. (3) Developers mostly fork repositories from creators. In comparison with unattractive repository owners, attractive repository owners have higher percentage of organizations, more followers and earlier registration in GitHub. Our results show that forking is mainly used for making contributions of original repositories, and it is beneficial for OSS community. Moreover, our results show the value of recommendation and provide important insights for GitHub to recommend repositories.

Keywords
  • Fork,
  • Open source software,
  • GitHub
Identifier
10.1007/s10664-016-9436-6
Publisher
Springer Verlag (Germany)
Copyright Owner and License
Authors
Creative Commons License
Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International
Additional URL
https://doi.org/10.1007/s10664-016-9436-6
Citation Information
Jing JIANG, David LO, Jiahuan HE, Xin XIA, et al.. "Why and how developers fork what from whom in GitHub" Empirical Software Engineering Vol. 22 Iss. 1 (2017) p. 547 - 578 ISSN: 1382-3256
Available at: http://works.bepress.com/david_lo/361/