Tuesday, April 28, 2015

Declarative specification of FSM-inference algorithms

The paper “Using declarative specification to improve the understanding, extensibility, and comparison of model-inference algorithms” recently appeared in IEEE Transactions on Software Engineering.  The paper is by Ivan Beschastnikh, Yuriy Brun, Jenny Abrahamson, me, and Arvind Krishnamurthy.  The title is a mouthful, because the paper accomplishes quite a bit, but I'll try to break it down here.

The models that the paper speaks of are finite state machines (FSMs), like this one that expresses some legal sequences of commands that a client program might issue to a mail program:


It is useful to have a FSM that expresses the behaviors of a system.  Unfortunately, the system designers may not have written one down, or the one they wrote may not be accurate.  An alternative is to infer a FSM from observed system executions.  This is such a good idea that many researchers have published such model-inference algorithms.


Computer scientists usually write algorithms as pseudocode, and model-inference algorithms are no exception.  Pseudocode is familiar to programmers and enables the inference algorithm to be implemented, but it is not helpful in understanding the inference algorithm.  The extensive proofs in any algorithm textbook show the limitations of reasoning about pseudocode.

The paper proposes InvariMint, an approach to specify model-inference algorithms declaratively.  The algorithm designer specifies what properties of the input log the algorithm should preserve, rather than how the algorithm works in terms of programming-language statements.  For instance, the algorithm designer might say, "If event A is always followed by event B in the log, then the algorithm should output a model in which event A must always be followed by event B."  This circumscribes what types of generalization that the algorithm is allowed to do.  The algorithm designer doesn't have to figure out how to encode this in a programming language nor to verify that the inference implementation is correct.


Here is how InvariMint works, in a nutshell:

  • At design time, the algorithm designer decides what sorts of properties the inference algorithm should express about a trace.  The designer generalizes these properties to patterns and specifies them as FSMs.
  • At run time, given an execution trace (a log), the algorithm mines matches for each of those patterns from the trace.  The algorithm intersects all the mined FSMs to obtain the final inferred model for the program.


We applied the InvariMint declarative approach to two model-inference algorithms that had previously been specified procedurally.  Specifying them declaratively with InvariMint (1) leads to new fundamental insights and better understanding of existing algorithms, (2) simplifies creation of new algorithms, including hybrids that combine or extend existing algorithms, and (3) makes it easy to compare and contrast previously published algorithms.  Furthermore, the InvariMint-generated algorithms were significantly faster than the original procedural versions of the algorithms.

You can read the paper here.  The InvariMint implementation is distributed along with Synoptic.

No comments: