Skip to main content
Article
Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem
ArXiv
  • Purvish Jajal, Purdue University
  • Wenxin Jiang, Purdue University
  • Arav Tewari, Purdue University
  • Joseph Woo, Purdue University
  • George K. Thiruvathukal, Loyola University Chicago
  • James C Davis, Purdue University
Document Type
Article
Publication Date
3-4-2023
Abstract

Many software engineers develop, fine-tune, and deploy deep learning (DL) models. They use DL models in a variety of development frameworks and deploy to a range of runtime environments. In this diverse ecosystem, engineers use DL model converters to move models from frameworks to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, failure modes and patterns of DL model converters are unknown. This knowledge gap adds engineering risk in DL interoperability technologies. In this paper, we conduct the first failure analysis on DL model converters. Specifically, we characterize failures in model converters associated with ONNX (Open Neural Network eXchange). We analyze failures in the ONNX converters for two major DL frameworks, PyTorch and TensorFlow. The symptoms, causes, and locations of failures are reported for N=200 issues. We also evaluate why models fail by converting 5,149 models, both real-world and synthetically generated instances. Through the course of our testing, we find 11 defects (5 new) across torch.onnx, tf2onnx, and the ONNXRuntime. We evaluated two hypotheses about the relationship between model operators and converter failures, falsifying one and with equivocal results on the other. We describe and note weaknesses in the current testing strategies for model converters. Our results motivate future research on making DL software simpler to maintain, extend, and validate.

Identifier
arXiv:2303.17708
Creative Commons License
Creative Commons Attribution 4.0 International
Citation Information
Jajal, P., Jiang, W., Tewari, A., Woo, J., Lu, Y., Thiruvathukal, G.K., & Davis, J.C. (2023). Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem. ArXiv, abs/2303.17708.