Monday, January 16, 2017

ICSE 2017 Most Influential Paper, and supporting your research tools

My ICSE 2007 paper that describes the Randoop test generation tool, Feedback-directed random test generation (coauthored with Carlos Pacheco, Shuvendu K. Lahiri, and Thomas Ball), has won the ICSE 2017 Most Influential Paper award.  This is a test-of-time award given for "the paper from the ICSE meeting 10 years prior that is judged to have had the most influence on the theory or practice of software engineering during the 10 years since its original publication."

Randoop generates tests for programs written in object-oriented languages; currently, Java and .NET versions exist. Automated test generation is a practically important research topic: in 2002, NIST reported, "the national [US] annual costs of an inadequate infrastructure for software testing is estimated to range from $22.2 to $59.5 billion."

Prior to this research, a typical test generation technique would generate many tests and then try to determine which ones were of value. For example, an error-revealing test is one that makes legal calls that yield incorrect results.  Without a formal specification, it is difficult to know whether a given call is legal and whether its outcome is desired.  Furthermore, multiple generated tests might fail for the same reason.

The technique of feedback-directed test generation generates one test at a time, executes the test, and classifies it as (probably) a normal execution, a failure, or an illegal input.  Based on this information, it biases the subsequent generation process to extend good tests and avoid bad tests.

Feedback-directed test generation was first introduced in my ECOOP 2005 paper Eclat: Automatic generation and classification of test inputs (coauthored with Carlos Pacheco). The ICSE 2007 Randoop paper expands on the technique, contributes a more usable tool, and reports on more extensive experiments.

Before and since, many other test generation strategies have since been proposed, including ones with desirable theoretical properties such as the ability to cover difficult-to-execute paths.  Some of these have seen commercial success.  Thanks to its scalability and simplicity, Randoop remains the standard benchmark against which other test generation tools are measured.  (Randoop is easy to use, but as with any tool, tuning parameters as described in its manual greatly improves its output, and that is what a conscientious user or researcher does.)  It's great to see subsequent research surpassing Randoop.  For example, the GRT tool adds a few clever techniques to Randoop and outperforms all other test generation tools.

One reason the Randoop paper had impact was its innovative approach to test generation, which has since been adopted by other tools.  An equally important reason is a decade of improvements, bug fixes, and other support.  After my student Carlos Pacheco graduated and moved on to other interests, I and my research group have continued to maintain Randoop for its community of industrial developers and academics.  This enables them to use Randoop to find bugs and generate regression tests, or to extend Randoop to support their own research.  The main Randoop paper or the tool have been cited 761 times in academic papers.

At ICSE 2007, I was awarded a ACM Distinguished Paper Award (a "best paper" award).  But the award wasn't for the Randoop paper!  Don't give up if others do not yet appreciate your work.  Time can change people's opinions about what research results are most important.  At the time, the program committee was more impressed with my paper Refactoring for parameterizing Java classes (coauthored with Adam Kie┼╝un, Frank Tip, and Robert M. Fuhrer).  This paper shows how to infer Java generics, such as converting a Java program from declaring and using List to declaring and using List<T>.  The paper is notable because it solves both the parameterization problem and the instantiation problem. That is, it infers what type parameters a generic class should have, as well as changing clients to provide type arguments when using the class.

The ICSE 2017 MIP award is my second test-of-time award.  I also received the 2013 ACM SIGSOFT Impact Paper Award, which is awarded to "a highly influential paper presented at a SIGSOFT-sponsored or co-sponsored conference held at least 10 years prior".  The award was for my ICSE 1999 paper Dynamically discovering likely program invariants to support program evolution (co-authored with Jake Cockrell, William G. Griswold, and David Notkin).  The paper describes a machine-learning technique that observes program executions and outputs likely specifications.  This was the first paper that described the Daikon invariant detector.

As with Randoop, a reason for Daikon's influence is that I and my research group maintained it for over a decade, supporting a community of users.  I did so against the advice of senior faculty who told me to abandon it and focus on new publications.

I recommend my approach to other researchers who care about impact.  Make your research tools public!  Support them after you release them!  This is a requirement of the scientific method, which requires replicating and extending prior research.  If you are not willing to do this, others are right to suspect your results.  Another important reason is that maintenance work helps others in the scientific community to do their own work.  If you need a selfish reason, it greatly increases your influence, as illustrated by these awards.

Monday, January 2, 2017

AMiner's most influential scholars

AMiner, which mines deep knowledge from scientific networks, has named me the #3 most influential scholar, among all software engineering researchers (living and dead).

Saturday, November 19, 2016

Automated generation of assert statements with Toradocu

Do you love writing test assertions?  If not, the Toradocu tool can help you.  Toradocu 1.0 has just been released.

Toradocu automatically generates assert statements that you can insert in your program.  This makes your existing tests more effective, and it can also find errors during normal operation of your program.

Toradocu converts your existing Javadoc comments into assertions.  It works as follows:

  • It parses an English sentence, yielding a parse tree that indicates the subject, the verb, the direct and indirect objects, etc.
  • It associates every noun phrase in the sentence with an expression in the program.  It does so by comparing the text of the noun phrase with variable names and types.
  • It associates every verb in the sentence with an operation in the program.  For example, it compares the text of the verb with method names.
  • Now that every noun and verb in the parse tree is associated with a program element, the sentence can be translated to a Java expression that can be used in an assert statement.
For example, suppose you wrote

 @throws IllegalArgumentException if the argument is not in the list and is not convertable

Toradocu aims to determine that "argument" refers to the method's formal parameter, "the list" refers to some variable whose type is a subtype of List, and "convertable" is determined by some method whose name or documentation refers to "convert".  Toradocu generates an aspect that you can weave into your program.  The aspect indicates an error whenever your program should throw IllegalArgumentException but does not, and the aspect indicates an error whenever your program should not throw an IllegalArgumentException but does so.  This helps you ensure that your program and its documentation are consistent.

Toradocu is described in greater detail in the ISSTA 2016 paper "Automatic generation of oracles for exceptional behaviors".

Toradocu 1.0 works only for @throws Javadoc tags, and achieves precision and recall around 70%.  We are working to improve the results, to add support for @param and @return tags, and to integrate Toradocu into test generators such as Randoop.

Friday, November 18, 2016

FSE 2016 student research competition

My students took 4 of the 6 awards in the FSE 2016 SRC (student research competition), including both first-place awards.

Undergraduate category:
  1. "Combining bug detection and test case generation", Martin Kellogg, University of Washington
  2. "Bounded model checking of state-space diginal systems", Felipe Monteiro, Federal University of Amazonas
  3. "Preventing signedness errors in numerical computations in Java", Christopher Mackie, University of Washington
Graduate category:
  1. "Cozy: Synthesizing collection data structures", Calvin Loncaric, University of Washington
  2. "How should static analysis tools explain anomalies to developers?Titus Barik, North Carolina State University
  3. "Evaluation of fault localization techniques", Spencer Pearson, University of Washington
Congratulations to all the winners!

Here are more details about each of the UW projects.

"Combining bug detection and test case generation", by Martin Kellogg

Software developers often write tests or run bug-finding tools.  Automated tools for these activities sometimes waste developer time: bug finders produce false positives, and test generators may use incorrect oracles. We present a new technique that combines the two approaches to find interesting, untested behavior, in a way that reduces wasted effort. Our approach generates tests that are guaranteed to rule out ("kill") some mutant that could not be killed by the existing test suite.  The developer must only determine whether the program under test is behaving correctly. If it is, then the new test casewhich improves mutation coveragecan be added to the test suite. If it is not behaving correctly, then our approach has discovered and reproduced a bug.  We demonstrated that the technique can find about a third of historical defects while only bothering the developer with truly novel input.

"Preventing signedness errors in numerical computations in Java", by Christopher Mackie

If a program mixes signed and unsigned values, it will produce meaningless results.  We have developed a verification tool that prevents such errors.  It is built on a type system that segregates signed from unsigned integers.  In a case study, our type system proved easy to use, and it detected a previously-unknown bug. Our type system is implemented as the Signedness Checker and is distributed with the Checker Framework.

"Cozy: Synthesizing collection data structures", by Calvin Loncaric

Many applications require specialized data structures not found in standard libraries. Implementing new data structures by hand is tedious and error-prone.  To alleviate this difficulty, we have built a tool called Cozy that synthesizes data structures using counter-example guided inductive synthesis. We have evaluated Cozy by showing how its synthesized implementations compare to handwritten implementations in terms of correctness and performance across four real-world programs. Cozy's data structures match the performance of the handwritten implementations while avoiding human error.

"Evaluation of fault localization techniques", by Spencer Pearson

Given failing tests, a fault localization tool attempts to identify which lines of source code are responsible for the failures. So far, evaluations of fault localization tools have overwhelmingly relied on artificial faults, generated by mutating correct programs.  Researchers have assumed that whichever tools best localize artificial faults will best localize real-world faults. We tested this by repeating several previous evaluations on both kinds of faults, and found that the assumption was false on our data set: artificial faults were not useful for identifying the best fault localization tools! Since this result calls into question all previous studies of these tools, we examined what makes a tool do well at localizing real faults, and we designed a new technique that outperforms all prior techniques we studied, on real faults.  Our technical report is available at

Friday, November 4, 2016

OOPSLA 2016 papers

OOPSLA 2016 was in Amsterdam this week.  Each day of the conference, one of my students presented his work:

On Wednesday, Pavel Panchekha presented "Automated Reasoning for Web Page Layout".  To create a good-looking HTML webpage, it is necessary to use CSS styling.  Pavel's Cassius system takes as input a picture or mockup of a webpage, and it outputs a CSS stylesheet that will produce the given layout.  This is one application of a formalization of CSS that can be used not just for synthesis of CSS code, but also for automated verification and debugging.

On Thursday, Konstantin Weitz presented "Scalable Verification of Border Gateway Protocol Configurations with an SMT Solver".  BGP is a networking protocol that determines how packets are routed through the internet.  ISPs configure their routers with BGP configurations, with the aim of providing good service, making a profit, satisfying contractual obligations, and not disrupting other parts of the Internet.  The Bagpipe tool verifies these properties, and it found multiple errors in real-world BGP configurations.

On Friday, Calvin Loncaric presented "A Practical Framework for Type Inference Error Explanation". Type inference is famous for being a powerful way to relieve programmers of the burden of writing types, and is equally famous for giving misleading error messages, often in the wrong part of the program, when the program contains an error.  Calvin's work shows a way to improve the error messages that are produced by type inference.

If you missed the conference, then be sure to check out their papers!

Wednesday, October 12, 2016

Nothing is better than the Optional type

JDK 8 introduced the Optional class, a container that is either empty or contains a non-null value.

Optional has numerous problems without countervailing benefits. Optional does not make your code more correct or robust. There is a real problem that Optional tries to solve, and there is a better way to solve it. Therefore, you are better off using a regular possibly-null Java reference, rather than using Optional.

The Web and blogosphere are full of claims that the Optional class solves the problem of null pointer exceptions. This is not true. Changing your code to use the Optional class has these effects:

  • It transforms a NullPointerException into a NoSuchElementException, which still crashes your program.
  • It creates new problems that were not a danger before.
  • It clutters your code.
  • It adds both space and time overheads.

When your code throws a NullPointerException into a NoSuchElementException, the underlying logic bug is that you forgot to check all possibilities when processing data. It's best to use a tool that guarantees you don't forget. That helps you to understand and fix the underlying problem.

(These criticisms aren't specific to Java's implementation of Optional. Other languages that claim to have solved the problem have also merely transformed it into a different manifestation.)

The Optional class isn't all bad: if you have to use Optional, it defines methods for reducing code clutter when dealing with possibly-present data. However, you should still avoid the Optional class.

I have written an article that expands on the above points and offers a better solution. Here is the outline of that article:
Please go to to read the article.

Friday, September 30, 2016

Applying for a graduate fellowship

It is the season to apply for graduate fellowships, with the first deadline for most students being the NSF Graduate Research Fellowship.

I maintain a webpage with advice about applying for a fellowship.  I have recently updated it, and you may find it useful when you apply.  Good luck!