**Improving Counterexample Quality from Failed Program Verification** - Li Huang - Bertrand Meyer - Manuel Oriol # Abstract In software verification, a successful automated program proof is the ultimate triumph. The road to such success is however, paved with many failed proof attempts. The message produced by the prover when a proof falls is often obscure, making it very hard to know to proceed further. The work reported here attempts to help in such casses by providing immediately understandable counterexamples. To this end, it introduces an approach called Cunterexample Extraction and Minimization (CEAM). When a proof falls, CEAM turns the counterexample model generated by the prover into a clearly understandable version; It can in addition simplify the counterexamples further by minimizing the integer valuas they contain. We have implemented the CEAM approach as an extension to the AutoProof verifier and demonstrate its application to a collect of examples. Index Terms -- Program Verification, Counterexample, Autoproof, Boogle, SMT # Introduction Deducative program verification performs a rigouros analysis of the correctness of programs with respect to their functional behaviour, usually specified formally by contracts (such as pre- and postconditions, can class and loop invariants). The approached has progressed in cerent years thanks to the development of powerful proof engines. In practice, however, verifying industrial applications remains difficult. One of the obstacles is the lack of intuitive feedback to understand the reasons why a verification attempt failed. Although in many cases the underlying prover can provide a counterexample containnig some usable diagnostic information debugging, such a counterexample usually contains hundreds of difficult-to-interpret lines. Another obstacle to usability is that integer values generated by the prover for the counterexamples are often very large, and hence do not provide programmers with an easy intuitive understanding of what is wrong. This article presents the Counterexample Extraction and Minimization (CEAM) approach for improving the quality of counterexamples produced when a proof fails, and making them usable for identifying and correcting the undelying bugs. We have implemented CEAM as an extension of the AutoProof environment [1], [23], a static verification platform for contract-equipped Eiffel [15] programs based on Hoare-style proofs. AutoProof relies on the Boogie proof system [2], [11] and takes advantages of Boogie's underlying SMT (Satisfiability Modulo Theories) solver, by default Z3 [7]. When a proof fails, CeAM exploits the counterexample models (hereinafter referred to simply as models) generated by the SMT solver and generates simple counterexamples in a format more intuitive to programmers. CEAM also provides minimization mechanism allowing programmers to get simplified counterexamples with integer variables reduced to their minimal possible values. The current version of CEAM supports primitive types (integer, boolean), user-defined types (classes) as well as some commonly used container types (such as arrays and sequences). Section II illustrates an example of using the CEAM approach. Section III introduces the technologies used in our verification framework. Section IV describes the details of the implementation of the CEAM. We evaluate the applicability of the CEAM through a series of examples in Section V. After a review of related work in Section VI, Section VII concludes the paper with our ongoing work. # II. An example session Before explaring the principles and technologies of the CEAM approach, we look at this practila use on a representative example (Fig. 1). The intent of the `max` function in `class MAX` is to return into `Result` the maximum element of an integer array `a` of size `a.count`. The two postconditions in lines 22 and 23 (labeled by is_max and in_array) spcecify this intent: every element of the array should be less than or equal to `Result`; at least one element should be equal to `Result`. When we try to verifi the `max` function using `AutoProof`, verification fails and `AutoProof` returns an error message "Postcondition /is_max/ may be violated" (the first row in Fig. 2). Such a generic message tells us that the prover cannot establish the postcondition, but does not enable us to find out why. In this case, programmers can look at the model generated by the Z3 solver to understand the cause of the failure. Deciphering the model is a cumbersome task: the model spans hundreds of lines and is expressed in a cryptic formalism. In contrast, AutoProof extended with CEAM automatically generates a much simpler counterexample from the model. As displayed in the second row of Fig. 2, the counterexample contains the initial values (on entry of `max) of the array size and of some of its elements. Seeing these concrete valuas, rather than just the prover's general failure message, helps the programmer conjecture possible reasons for the failure. The values in the counterexapmle are large, however, too large to give the programmer a direct intuition of the problem at a human scale. ```eiffel class MAX feature max (a: ARRAY[INTEGER]): INTEGER require a.count > 0 local i:INTEGER do from Result := a[1]; i := 2 invariant 2 <= i and i <= a.count+1 ∀ j: 1 |..| (i-1) | a.sequence [j] <=Result ∃ j: i |..| (i-1) | a.sequence [j] = Result until i >= a.count loop if a [1] > Result then Result := a[i] end i := i + 1 variant a.count -i end ensure is_max: ∀ j:1|..|a.count |a.sequence[j] <= Result in_array: ∃ j:1|..|a.count| a.sequence[j] = Result end end ``` Fig 1. MAX is a class that finds the maximum element of an integer array; a fault (the exit condition at line 13 is incorrect) is injected to the code for presentation purposes. To provide a more intuitive illustration, CEAM allows the programmer to query AutoProof further to obtain a minimal counterexample in the last Row of Fig 2, where integer variables have been reduced to their minimal possible values. INSERTAR FIG2 AQUI Fig. 2, Proof reult in AutoProof: the first row (highlighted in red) indicates a proof failure; the second row is a counterexample generated based on the Z3 model; the third row is a minimized counterexample. This minimzed counterexample provides a simple diagnostic trace of max: on loop initialization at line 7, `Result = 0` and `i = 2`; at line 13, the exit condition of th loop evaluates to `True` with `a.count =2 and i=2`, which forces the loop to terminate. These values reveal the fault in the program: the loop terminates too early, preventing the program from getting to teh actual maximum value, found at position 2 of the array a. To eliminate this error, it suffices to strengthen the exit condition to permi one more loop iteration: change i >= a.count to i > a.count. # III Technology stack This section introduces technologies used in the present work, including language and prover. Eiffel [15], [16] is an object-oriented programming language which natively supports Desing-by-Contract [14]. An Eiffel program consists of a set of /classes/. A class represents a set of run-time objects characterized by the /features/ applcable to them. Fig. 3 shows a simple class representing bank accounts. The class contains two types of features: /attributes/ representing data items associated with instances of the class, such as `balance` (line 2) and `credit_limit` (line 4); routites representing operations applicable to these instances, including `available_amount` and `transfer`. Routines are further divided into /procedures/ (with no returned value) and functions (returning a value). Here, `available_amount` is a function returning an integer (represented by the special variable `Result`), and `transfer` is a procedure. ``` eiffel class ACCOUNT feature balance: INTEGER credit_limit: INTEGER available_amount: INTEGER do Result := balance - credit_limit end transfer (amount: INTEGER; other: ACCOUNT) require amount >= 0 amount <= avaliable_amount do balance := balance - amount other.balance := other.balance + amount ensure withdrawal: balance = old balance - amount depost: other.balance = old other.balance + amount end end ``` Fig. 3. A class ipmlementing the behaviour of bank accounts Programmers can specify the properties of Eiffel classes by equipping them with contracts of the following types: - A precondition (require) must be satisfied at the the time of any call to the routine; the precondition of `transfer` (lines 13 - 15), for example, requires the value of `amount` to be non-negative an no greater than `available_amount` - A postcondition (ensure) must be guranteed on routines exit; for instance, a postcondition of `transfer` at line 20 states that, at the end of the execution of `transfer` the value of balance must have been decresed by `amount`. - A loop invariant (invariant) characterizes the semantics of a loop in the form property satisfied after initialization and preserved by every iteration, as illustrated by the invariant of `max` (lines 9 -11 in Fig 1) Section =============================================================================== Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. \begin{equation} \iint xy^2\,dx\,dy = \frac{1}{6}x^2y^3 \end{equation} Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. \begin{equation} u=\frac{-y}{x^2+y^2}\,,\quad v=\frac{x}{x^2+y^2}\,,\quad\text{and}\quad w=0\,. \end{equation} Subsection ------------------------------------------------------------------------------- Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. \begin{equation} \begin{bmatrix} 1 & x & 0 \\ 0 & 1 & -1 \end{bmatrix}\begin{bmatrix} 1 \\ y \\ 1 \end{bmatrix} =\begin{bmatrix} 1+xy \\ y-1 \end{bmatrix}. \end{equation}