Skip to main content
Overfitting in semantics-based automated program repair
Empirical Software Engineering
  • Dinh Xuan Bach LE, Singapore Management University
  • Ferdian THUNG, Singapore Management University
  • David LO, Singapore Management University
  • Claire LE GOUES, Carnegie Mellon University
Publication Type
Journal Article
Publication Date

The primary goal of Automated Program Repair (APR) is to automatically fix buggy software, to reduce the manual bug-fix burden that presently rests on human developers. Existing APR techniques can be generally divided into two families: semantics- vs. heuristics-based. Semantics-based APR uses symbolic execution and test suites to extract semantic constraints, and uses program synthesis to synthesize repairs that satisfy the extracted constraints. Heuristic-based APR generates large populations of repair candidates via source manipulation, and searches for the best among them. Both families largely rely on a primary assumption that a program is correctly patched if the generated patch leads the program to pass all provided test cases. Patch correctness is thus an especially pressing concern. A repair technique may generate overfitting patches, which lead a program to pass all existing test cases, but fails to generalize beyond them. In this work, we revisit the overfitting problem with a focus on semantics-based APR techniques, complementing previous studies of the overfitting problem in heuristics-based APR. We perform our study using IntroClass and Codeflaws benchmarks, two datasets well-suited for assessing repair quality, to systematically characterize and understand the nature of overfitting in semantics-based APR. We find that similar to heuristics-based APR, overfitting also occurs in semantics-based APR in various different ways.

  • Automated program repair,
  • Program synthesis,
  • Symbolic execution,
  • Patch overfitting
Springer Verlag (Germany)
Copyright Owner and License
Creative Commons License
Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International
Additional URL
Citation Information
Dinh Xuan Bach LE, Ferdian THUNG, David LO and Claire LE GOUES. "Overfitting in semantics-based automated program repair" Empirical Software Engineering Vol. 23 Iss. 5 (2018) p. 3007 - 3033 ISSN: 1382-3256
Available at: