Both practitioners and researchers need to evaluate the quality of test suites — for example, researchers want to know whether a new testing technique improves a test suite. The true measure of a test suite's quality is how many real faults it detects. The set of real faults is typically unknown, so the state-of-the-art approach is to measure how many artificial faults, called mutants, a test suite detects. Hundreds of research papers make the assumption that if a test suite detects more mutants, then it will detect more real faults as well. Amazingly, no one knows whether this assumption is true! Or, no one did until our research.
The paper reports on extensive experimentation that shows that mutant detection is the best available proxy for test suite quality. We also showed how mutation analysis can be improved and identified its fundamental limitations that prevent it from perfectly predicting real fault detection. Our analysis accounts for confounding factors such as code coverage. In addition to these experimental results, the real faults and test suites we assembled can be used in future testing research.
The paper was presented on November 20 at FSE, one of the two top software engineering conferences. The paper was authored by René Just, Darioush Jalali, and Michael Ernst of UW CSE, Laura Inozemtseva and Reid Holmes (a former postdoc at UW) of the University of Waterloo, and Gordon Fraser of the University of Sheffield. This is the second ACM Distinguished Paper award this year for René, Gordon, and me.
You can read the paper at http://homes.cs.washington.edu/~mernst/pubs/mutation-effectiveness-fse2014.pdf. You can obtain the tools and experimental data at http://mutation-testing.org and http://defects4j.org.
No comments:
Post a Comment