PATPAT: Program analysis, the practice and theory: 2025

UW PLSE members René Just, Darioush Jalali, and Michael Ernst have been awarded the ISSTA 2024 Impact Award for their ISSTA 2014 paper “Defects4J: A Database of existing faults to enable controlled testing studies for Java programs”. The Impact Award is given to the single best paper presented at a conference, 10 years later when the impact of all the papers is better known. ISSTA is the ACM SIGSOFT International Symposium on Software Testing and Analysis, a premiere research venue.

Defects4J is a dataset of defects (bugs) mined from the version control repositories of open-source programs. It has become an essential resource in any research related to program defects, including test case generation and prioritization, fault localization, automated program repair, mutation analysis, and more. Google Scholar reports 1700 publications that mention Defects4J.

When building a tool to help programmers, the gold standard of quality is whether the tool is effective on real problems in real code. In the past, such an evaluation was so difficult that most papers used fake defects, such as randomly changing part of a program. Such defects are quite different than the ones that arise in real-world programs, raising questions about the generality of the research results. Defects4J collected real defects and organized them in a way that made them easy to use.

For each of 835 defects (a number that is always increasing), Defects4J contains information from the source code repository:

A defective version of the program.
A fixed version of the program.
A test suite.

The Defects4J authors did significant work to make these defects more useful to researchers.

They manually pruned out irrelevant changes between the program versions. For instance, if the program's developers both fixed the defects and performed a refactoring, the Defects4J authors removed the refactoring.
They removed tests that fail on the fixed version of the program. They also removed flaky tests, which are non-deterministic or dependent on test order. If there is no triggering test (a test that fails before the fix and passes after the fix), they did not include the defect in Defects4J.
They wrote scripts that present a common interface to every defect for compiling, testing, computing code coverage, and other tasks. This common interface abstracts away from the program's version control system, build system, library dependencies, and other details. This common Defects4J interface makes running experiments with Defects4J much easier than with other collections of defects.

The Defects4J authors (René, Darioush, and Mike) were motivated by their own experiments: they wanted to ensure that their evaluations were realistic and accurate. Their work in creating Defects4J ended up having a big influence on many research areas, as recognized by the Impact Award.

Defects4J is freely available at https://defects4j.org/.

PATPAT: Program analysis, the practice and theory

Saturday, April 19, 2025

Run time, run-time, and runtime

Defects4J wins the ISSTA 2024 Impact Award

Contributors

Blog Archive