Skip to main content
Dataset
PeaTMOSS: Mining Pre-Trained Models in Open-Source Software
Computer Science: Faculty Publications and Other Works
  • Wenxin Jiang, Purdue University
  • Jason Jones, Purdue University
  • Jerin Yasmin, Queen's University - Kingston, Ontario
  • Nicholas Synovic, Loyola University Chicago
  • Rajiv Sashti, Purdue University
  • Sophie Chen, The University Of Michigan
  • George K. Thiruvathukal, Loyola University Chicago
  • Yuan Tian, Queen's University - Kingston, Ontario
  • James C Davis, Purdue University
Document Type
Data Set
Publication Date
10-5-2023
Abstract

Developing and training deep learning models is expensive, so software engineers have begun to reuse pre-trained deep learning models (PTMs) and fine-tune them for downstream tasks. Despite the widespread use of PTMs, we know little about the corresponding software engineering behaviors and challenges. To enable the study of software engineering with PTMs, we present the PeaTMOSS dataset: Pre-Trained Models in Open-Source Software. PeaTMOSS has three parts: a snapshot of (1) 281,638 PTMs, (2) 27,270 open-source software repositories that use PTMs, and (3) a mapping between PTMs and the projects that use them. We challenge PeaTMOSS miners to discover software engineering practices around PTMs. A demo and link to the full dataset are available at: https://github.com/PurdueDualityLab/PeaTMOSS-Demos.

Identifier
arXiv:2310.03620
Comments

Author Posting © 2023 The Authors.The link to the demo and to the full dataset are available at:

https://github.com/PurdueDualityLab/PeaTMOSS-Demos

Creative Commons License
Creative Commons Attribution 4.0 International
Citation Information
Jiang, W., Jones, J., Yasmin, J., Synovic, N., Sashti, R., Chen, S., Thiruvathukal, G.K., Tian, Y., & Davis, J.C. (2023). PeaTMOSS: Mining Pre-Trained Models in Open-Source Software, arXiv:2310.03620