Saturday, November 19, 2016

Automated generation of assert statements with Toradocu

Do you love writing test assertions?  If not, the Toradocu tool can help you.  Toradocu 1.0 has just been released.

Toradocu automatically generates assert statements that you can insert in your program.  This makes your existing tests more effective, and it can also find errors during normal operation of your program.

Toradocu converts your existing Javadoc comments into assertions.  It works as follows:

  • It parses an English sentence, yielding a parse tree that indicates the subject, the verb, the direct and indirect objects, etc.
  • It associates every noun phrase in the sentence with an expression in the program.  It does so by comparing the text of the noun phrase with variable names and types.
  • It associates every verb in the sentence with an operation in the program.  For example, it compares the text of the verb with method names.
  • Now that every noun and verb in the parse tree is associated with a program element, the sentence can be translated to a Java expression that can be used in an assert statement.
For example, suppose you wrote

 @throws IllegalArgumentException if the argument is not in the list and is not convertable

Toradocu aims to determine that "argument" refers to the method's formal parameter, "the list" refers to some variable whose type is a subtype of List, and "convertable" is determined by some method whose name or documentation refers to "convert".  Toradocu generates an aspect that you can weave into your program.  The aspect indicates an error whenever your program should throw IllegalArgumentException but does not, and the aspect indicates an error whenever your program should not throw an IllegalArgumentException but does so.  This helps you ensure that your program and its documentation are consistent.

Toradocu is described in greater detail in the ISSTA 2016 paper "Automatic generation of oracles for exceptional behaviors".

Toradocu 1.0 works only for @throws Javadoc tags, and achieves precision and recall around 70%.  We are working to improve the results, to add support for @param and @return tags, and to integrate Toradocu into test generators such as Randoop.

Friday, November 18, 2016

FSE 2016 student research competition

My students took 4 of the 6 awards in the FSE 2016 SRC (student research competition), including both first-place awards.

Undergraduate category:
  1. "Combining bug detection and test case generation", Martin Kellogg, University of Washington
  2. "Bounded model checking of state-space diginal systems", Felipe Monteiro, Federal University of Amazonas
  3. "Preventing signedness errors in numerical computations in Java", Christopher Mackie, University of Washington
Graduate category:
  1. "Cozy: Synthesizing collection data structures", Calvin Loncaric, University of Washington
  2. "How should static analysis tools explain anomalies to developers?Titus Barik, North Carolina State University
  3. "Evaluation of fault localization techniques", Spencer Pearson, University of Washington
Congratulations to all the winners!

Here are more details about each of the UW projects.

"Combining bug detection and test case generation", by Martin Kellogg

Software developers often write tests or run bug-finding tools.  Automated tools for these activities sometimes waste developer time: bug finders produce false positives, and test generators may use incorrect oracles. We present a new technique that combines the two approaches to find interesting, untested behavior, in a way that reduces wasted effort. Our approach generates tests that are guaranteed to rule out ("kill") some mutant that could not be killed by the existing test suite.  The developer must only determine whether the program under test is behaving correctly. If it is, then the new test casewhich improves mutation coveragecan be added to the test suite. If it is not behaving correctly, then our approach has discovered and reproduced a bug.  We demonstrated that the technique can find about a third of historical defects while only bothering the developer with truly novel input.

"Preventing signedness errors in numerical computations in Java", by Christopher Mackie

If a program mixes signed and unsigned values, it will produce meaningless results.  We have developed a verification tool that prevents such errors.  It is built on a type system that segregates signed from unsigned integers.  In a case study, our type system proved easy to use, and it detected a previously-unknown bug. Our type system is implemented as the Signedness Checker and is distributed with the Checker Framework.

"Cozy: Synthesizing collection data structures", by Calvin Loncaric

Many applications require specialized data structures not found in standard libraries. Implementing new data structures by hand is tedious and error-prone.  To alleviate this difficulty, we have built a tool called Cozy that synthesizes data structures using counter-example guided inductive synthesis. We have evaluated Cozy by showing how its synthesized implementations compare to handwritten implementations in terms of correctness and performance across four real-world programs. Cozy's data structures match the performance of the handwritten implementations while avoiding human error.

"Evaluation of fault localization techniques", by Spencer Pearson

Given failing tests, a fault localization tool attempts to identify which lines of source code are responsible for the failures. So far, evaluations of fault localization tools have overwhelmingly relied on artificial faults, generated by mutating correct programs.  Researchers have assumed that whichever tools best localize artificial faults will best localize real-world faults. We tested this by repeating several previous evaluations on both kinds of faults, and found that the assumption was false on our data set: artificial faults were not useful for identifying the best fault localization tools! Since this result calls into question all previous studies of these tools, we examined what makes a tool do well at localizing real faults, and we designed a new technique that outperforms all prior techniques we studied, on real faults.  Our technical report is available at

Friday, November 4, 2016

OOPSLA 2016 papers

OOPSLA 2016 was in Amsterdam this week.  Each day of the conference, one of my students presented his work:

On Wednesday, Pavel Panchekha presented "Automated Reasoning for Web Page Layout".  To create a good-looking HTML webpage, it is necessary to use CSS styling.  Pavel's Cassius system takes as input a picture or mockup of a webpage, and it outputs a CSS stylesheet that will produce the given layout.  This is one application of a formalization of CSS that can be used not just for synthesis of CSS code, but also for automated verification and debugging.

On Thursday, Konstantin Weitz presented "Scalable Verification of Border Gateway Protocol Configurations with an SMT Solver".  BGP is a networking protocol that determines how packets are routed through the internet.  ISPs configure their routers with BGP configurations, with the aim of providing good service, making a profit, satisfying contractual obligations, and not disrupting other parts of the Internet.  The Bagpipe tool verifies these properties, and it found multiple errors in real-world BGP configurations.

On Friday, Calvin Loncaric presented "A Practical Framework for Type Inference Error Explanation". Type inference is famous for being a powerful way to relieve programmers of the burden of writing types, and is equally famous for giving misleading error messages, often in the wrong part of the program, when the program contains an error.  Calvin's work shows a way to improve the error messages that are produced by type inference.

If you missed the conference, then be sure to check out their papers!

Wednesday, October 12, 2016

Nothing is better than the Optional type

JDK 8 introduced the Optional class, a container that is either empty or contains a non-null value.

Optional has numerous problems without countervailing benefits. Optional does not make your code more correct or robust. There is a real problem that Optional tries to solve, and there is a better way to solve it. Therefore, you are better off using a regular possibly-null Java reference, rather than using Optional.

The Web and blogosphere are full of claims that the Optional class solves the problem of null pointer exceptions. This is not true. Changing your code to use the Optional class has these effects:

  • It transforms a NullPointerException into a NoSuchElementException, which still crashes your program.
  • It creates new problems that were not a danger before.
  • It clutters your code.
  • It adds both space and time overheads.

When your code throws a NullPointerException into a NoSuchElementException, the underlying logic bug is that you forgot to check all possibilities when processing data. It's best to use a tool that guarantees you don't forget. That helps you to understand and fix the underlying problem.

(These criticisms aren't specific to Java's implementation of Optional. Other languages that claim to have solved the problem have also merely transformed it into a different manifestation.)

The Optional class isn't all bad: if you have to use Optional, it defines methods for reducing code clutter when dealing with possibly-present data. However, you should still avoid the Optional class.

I have written an article that expands on the above points and offers a better solution. Here is the outline of that article:
Please go to to read the article.

Friday, September 30, 2016

Applying for a graduate fellowship

It is the season to apply for graduate fellowships, with the first deadline for most students being the NSF Graduate Research Fellowship.

I maintain a webpage with advice about applying for a fellowship.  I have recently updated it, and you may find it useful when you apply.  Good luck!

Monday, September 26, 2016

JavaOne 2016 talks

I first spoke at JavaOne in 2008, and I have presented most years since.  It's a great way to communicate a message about practical software development tools.  An academic can even squeeze in a few few nuggets of theory as enrichment, so long as the fundamentals of usefulness to practitioners is covered.

Last week, I gave three talks (together with Werner Dietl), spanning 4 hours, at JavaOne 2016.  The common theme was the use of Java 8's type annotations feature to improve code quality.  This was the most rewarding year yet:  with ever-increasing adoption of Java 8, there was lively interest and the sessions were very interactive with lots of questions and comments.

The slides are now available (though you always get more from being at the talk, hearing the spiel, and seeing the demos):

Tutorial TUT3422:  Preventing Errors Before They Happen

Conference talk CON5739: Disciplined Locking: No More Concurrency Errors

Birds-of-a-feather session BOF3427: Using Type Annotations to Improve Your Code

Thursday, September 22, 2016

Predicting the 2016 presidential election

My "data programming" class, UW CSE 160, is a popular introductory computer science class.  Its examples and assignments are taken from the real world and use datasets from science, engineering, business, etc. -- not from abstract math, puzzles, or programming itself, such as computing the Fibonacci sequence or implementing a linked list.  These real-world examples are more compelling to students, and they better prepare students for the programming they will do in the future.  The class's assignments can also be integrated into an existing class without fully adopting the methodology of real-world examples.

One particularly popular assignment asks students to predict the outcome of the 2012 election, based on polling data.  Preceding the 2012 election, many political pundits, working from their gut feel, predicted a Romney win or said the election was "too close to call".  Nate Silver of the website FiveThirtyEight had been predicting an Obama win for months, and he correctly predicted the outcome of every state.

The key to Silver's approach is to combine different polls using a weighted average. In a normal average, each data point contributes equally to the result. In a weighted average, some data points contribute more than others. Silver examined how well each polling organization had predicted previous elections, and then weighted their polls according to their accuracy: more biased pollsters had less effect on the weighted average.

The concepts are simple enough for beginning programmers to complete successfully after just 3 or 4 weeks of instruction.  The assignment is interesting enough to be assigned later in the term, too.  Since most of the assignment is provided and students just have to implement 10 functions, this assignment also gives students practice in the critical skill of reading code.

When this assignment was first handed out in January 2013, the 2012 election was a fresh memory.  Now it may seem dated to students. Therefore, you could update the assignment to use polling data for the 2016 election.

Doing so requires collecting and cleaning polling data.  You can find information about how we collected and cleaned data for the 2008 and 2012 elections, in file (Students:  this doesn't give any hints about how to solve the assignment.)

If you adapt the Python programs in that zip file and collect polling information about the 2016 election, please share the fruits of your labor by emailing me.  Other instructors and their students will appreciate it!

Friday, September 9, 2016

Python evaluation rules

You cannot program in a programming language if you don't understand how the language works.  Some novices find programming a frustrating, opaque endeavor because they don't understand how the computer executes their programs.  When their program does not work, they make wild guesses about what changes might improve the program, and they try out their guesses by running the program.

If a programmer understands the language, then the programmer can understand what the program is doing and why it is producing the observed output.  The programmer can determine a proper fix or devise meaningful experiments to better understand what values are being manipulated at run time.

Unfortunately, not all students are taught these simple, effective techniques.  I have been horrified to see instructors (even at my own institution!) teach bad habits by telling students, "The only way to know what this program does is to run it."  Many programming textbooks and websites are just as bad:  they don't explain the programming model, or they do so in vague English.

The Python Evaluation Rules document gives precise semantics for much of the Python language.  It presents step-by-step rules for executing a Python program.  Every skilled programmer uses such rules to reason about the effects of Python code.  This document helps beginners to become experts more quickly.  With this knowledge, a programmer finds it easier to write correct code, debug incorrect code, and read unfamiliar code.

The document applies to both Python 2 and Python 3. It does not cover every possible Python expression, but does cover the most frequently-used ones.

This document has been in use for 5 years, with great success.  I recently moved to a new GitHub repository, where source code is available and you can submit bug reports or make pull requests.

Thursday, August 11, 2016

Building robust distributed systems

Distributed systems are notoriously error-prone.  It's hard enough for programmers to reason about multiple actions occurring on different nodes, but they also need to understand the messages that are sent among these nodes.  (The problem becomes murkier yet in the presence of failures.  A distributed system is designed to be faster and/or more reliable than a single-node system, but a distributed system contains more computers and connections, so it suffers component failures more frequently than the single-node system!)

I have been pursing several approaches to aid in the creation of distributed systems.  Here are two of those projects.

To help programmers understand and debug distributed systems, my colleagues and I have created a visualization tool, ShiViz.  Programmers typically debug distributed systems by examining their logs, but the logs are hard to understand.  ShiViz creates a time-space diagram, a standard representation that programmers already use to represent the hosts and the ordering of messages between them.  See the accompanying pictures of a conceptual time-space diagram and how ShiViz represents an actual execution.  A time-space diagram shows when messages were sent and received, and thus it establishes causality:  if event E1 occurred at node N before any effect of event E2 reached node N, then event E1 did not depend on E2.  ShiViz works with existing logs, though it enhances them with causality information between events.  ShiViz takes information that programmers would otherwise have to grep as text, and presents it in a form that programmers can explore, search, and understand.  You can try ShiViz online, without having to install anything on your computer.  Our article "Debugging Distributed Systems" appeared this month in Communications of the ACM.  The article describes approaches to debugging distributed systems and shows when our visualization approach is a good choice.

To help programmers create distributed systems that are correct by construction, my colleagues and I have created Verdi, a framework for proving correctness of distributed systems.  Verdi is a heavyweight approach built upon the Coq proof assistant:  a programmer writes both source code and a proof of correctness for the source code, and thanks to the Curry-Howard isomorphism the program type-checks only if the proof is valid.  This gives strong guarantees.  While it is challenging to write such proofs for sequential code, it is much worse for distributed systems.  Our system, Verdi, enables a programmer to write and prove a sequential program in Coq.  Then Verdi transforms the program into a distributed system, and it transforms the proof into a proof of the distributed system.  This approach, which we call "verified system transformers", abstracts away the difficulty of writing and proving a distributed system.  Using Verdi, we created the first formally-verified implementation of the widely-used Raft consensus protocol.  Our PLDI 2015 paper describes Verdi, and our CPP 2016 paper describes the proof in more detail.  The CPP 2016 paper also explains the methodology that we developed while writing 50,000 lines of proof --- a methodology that significant eased the burden of using Coq and can be adopted by other researchers.