"PeaTMOSS: Mining Pre-Trained Models in Open-Source Software" by Wenxin Jiang

George K. Thiruvathukal, PhD

Follow Contact

Dataset

PeaTMOSS: Mining Pre-Trained Models in Open-Source Software

Computer Science: Faculty Publications and Other Works

Wenxin Jiang, Purdue University
Jason Jones, Purdue University
Jerin Yasmin, Queen's University - Kingston, Ontario
Nicholas Synovic, Loyola University Chicago
Rajiv Sashti, Purdue University
Sophie Chen, The University Of Michigan
George K. Thiruvathukal, Loyola University Chicago
Yuan Tian, Queen's University - Kingston, Ontario
James C Davis, Purdue University

Download

Document Type

Data Set

Publication Date

10-5-2023

Disciplines

Abstract

Developing and training deep learning models is expensive, so software engineers have begun to reuse pre-trained deep learning models (PTMs) and fine-tune them for downstream tasks. Despite the widespread use of PTMs, we know little about the corresponding software engineering behaviors and challenges. To enable the study of software engineering with PTMs, we present the PeaTMOSS dataset: Pre-Trained Models in Open-Source Software. PeaTMOSS has three parts: a snapshot of (1) 281,638 PTMs, (2) 27,270 open-source software repositories that use PTMs, and (3) a mapping between PTMs and the projects that use them. We challenge PeaTMOSS miners to discover software engineering practices around PTMs. A demo and link to the full dataset are available at: https://github.com/PurdueDualityLab/PeaTMOSS-Demos.

Identifier

arXiv:2310.03620

Comments

https://github.com/PurdueDualityLab/PeaTMOSS-Demos

Creative Commons License

Creative Commons Attribution 4.0 International

Citation Information

Jiang, W., Jones, J., Yasmin, J., Synovic, N., Sashti, R., Chen, S., Thiruvathukal, G.K., Tian, Y., & Davis, J.C. (2023). PeaTMOSS: Mining Pre-Trained Models in Open-Source Software, arXiv:2310.03620