The paper reports on extensive experimentation that shows that mutant detection is the best available proxy for test suite quality. We also showed how mutation analysis can be improved and identified its fundamental limitations that prevent it from perfectly predicting real fault detection. Our analysis accounts for confounding factors such as code coverage. In addition to these experimental results, the real faults and test suites we assembled can be used in future testing research.
René Just, Darioush Jalali, and Michael Ernst of UW CSE, Laura Inozemtseva and Reid Holmes (a former postdoc at UW) of the University of Waterloo, and Gordon Fraser of the University of Sheffield. This is the second ACM Distinguished Paper award this year for René, Gordon, and me.
You can read the paper at http://homes.cs.washington.edu/~mernst/pubs/mutation-effectiveness-fse2014.pdf. You can obtain the tools and experimental data at http://mutation-testing.org and http://defects4j.org.