Thursday, March 1, 2018

Generalized data structure synthesis with Cozy

For most of your programming, a few data structures suffice:  lists, sets, and maps/dictionaries.  But sometimes, your program requires a data structure with a more sophisticated interface or a more efficient algorithm.  Then you need to implement it yourself -- often in terms of lists, sets, and maps/dictionaries.

Implementing a data structure is time-consuming, error-prone, and it is difficult to achieve high efficiency.  Our Cozy tool improves this situation by creating the data structure for you.

Cozy's input is small, easy to write, and easy to write correctly.  You provide as input a trivial implementation, which uses an inefficient representation and inefficient algorithms.  For example, you might define a graph data structure as a set of <node, node> pairs (one for each edge).  Every operation, such as determining whether two nodes are connected or determining the out-degree of a node, requires searching the set for the given pair of nodes.  You can also define (inefficient) update operations, such as adding or removing graph edges or nodes.  You write the specification declaratively, by writing a set comprehension rather than an iteration.

Cozy's output is an implementation of the data structure.  For instance, it can automatically generate an implementation of a graph that uses adjacency lists.  The implementation runs fast because it uses an efficient representation.  For each update operation, the implementation automatically updates the representation; programmers often make mistakes in the update code.

We used Cozy to replace the implementations of data structures in 4 open-source programs.  Cozy requires requires an order of magnitude fewer lines of code than manual implementation, makes no mistakes even when human programmers do, and generally matches the performance of hand-written code.

At its heart, Cozy searches over all possible implementations of the user-provided specifications and determines whether each one is a correct implementation of the specification.  (Cozy has to synthesize both a representation and operations over it.)  A key idea in Cozy is to use the technique of "wishful thinking".  When Cozy notices that it would be convenient to have access to the value of a particular expression, then Cozy just uses that expression and then tries to synthesize an efficient implementation of it.  In the end, Cozy outputs the most efficient overall implementation that it has discovered.

A paper on Cozy will appear in ICSE 2018.  Compared to a previous version of Cozy described in PLDI 2016, the current version is more general; it does not need a custom outline language (it handles arbitrary expressions, and its search over implementations is more powerful) nor a tuning step over a user-supplied benchmark (it contains a symbolic cost model).  It can generate more complex data structures that involve multiple collections and aggregation operations.

Cozy was built by a team led by Calvin Loncaric.
Cozy is publicly available at https://github.com/CozySynthesizer/cozy.  Give it a try!