Proof-of-Concept Formal Verification of SP1 zk chips

Security

•

May 21, 2025

Synopsis

‍

We are excited to announce our first engagement with Succinct Labs, which has demonstrated the feasibility of formal verification of correctness of SP1 zk chips in the Lean proof assistant, and which constitutes an important initial step on the pathway towards establishing broader formal guarantees of the security of the SP1 zk infrastructure.

‍

In particular, we have built a verification infrastructure that allows one to extract Plonky3 constraints from SP1 chips into Lean, formulate Lean specifications of chip behaviour, and prove that the extracted constraints conform to these specifications. We have then used this infrastructure to establish correctness of the SP1 AddSub chip, proving that the associated constraints satisfy the specification of wrap-around 32-bit addition. Given the modular architecture of SP1, we believe that this approach has great potential for scalability to the most complex SP1 circuits.

‍

This developed infrastructure is publicly available as follows:

‍

the constraint extraction mechanism can be found here;
the Lean conformance proof templater can be found here; and
the Lean proof of correctness for the AddSub chip can be found here.

‍

In the rest of the blog, we discuss these three aspects in more detail, using the AddSub chip as a running example, and outline a number of plans for potential further work. For those without too much time on their hands, the most important technical takeaways/highlights are that:

‍

SP1 is implemented in such a way that makes it straightforward to extract Lean-parsable constraints from the Plonky3 builder;
the SP1 architecture allows for modular reasoning at the level of chips, making verification simpler and more scalable; and that
the design of SP1 appears to lend itself well to future maintainable formal verification.

‍

Constraint Extraction Mechanism

‍

We start by enabling the extraction of constraints from SP1 zk chips into human- and Lean-readable format. In particular, SP1 already allowed one to obtain a CUDA program that evaluates the constraint polynomial associated with a given chip. We extended this work with a mechanism for evaluating said CUDA programs and extracting Lean-readable constraints from the outputs of this evaluation. This mechanism can be found as part of this repository and is localised to the air/src/constraint_extraction folder.

‍

Assuming an installation of Rust is present on the system, after cloning the repository the constraints for all chips except KeccakPermute and Global can be extracted by running:

‍

RUST_MIN_STACK=104857600 cargo test -- --nocapture

‍

The extracted constraints will then be placed into the air/chip_constraints folder, with one file generated per chip. Currently, the constraints for the KeccakPermute chip take approximately 20 minutes to be extracted, whereas constraint extraction for the Global chip fails, but understanding this failure was assessed to go beyond the scope of this engagement.

‍

`AddSub`: Extraction by Example

‍

We present how our constraint extraction mechanism works by examining the following constraints, extracted from the AddSub chip:

‍

-- Basic constraints
Basic((ML17 * (ML17 - 1)) = 0)
Basic((ML18 * (ML18 - 1)) = 0)
Basic(((ML17 + ML18) * ((ML17 + ML18) - 1)) = 0)
Basic((ML16 * (((ML8 + ML12) - ML1) * (((ML8 + ML12) - ML1) - 256))) = 0)
Basic((ML16 * ((((ML9 + ML13) - ML2) + ML5) * ((((ML9 + ML13) - ML2) + ML5) - 256))) = 0)
Basic((ML16 * ((((ML10 + ML14) - ML3) + ML6) * ((((ML10 + ML14) - ML3) + ML6) - 256))) = 0)
Basic((ML16 * ((((ML11 + ML15) - ML4) + ML7) * ((((ML11 + ML15) - ML4) + ML7) - 256))) = 0)
Basic((ML16 * (ML5 * (((ML8 + ML12) - ML1) - 256))) = 0)
Basic((ML16 * (ML6 * ((((ML9 + ML13) - ML2) + ML5) - 256))) = 0)
Basic((ML16 * (ML7 * ((((ML10 + ML14) - ML3) + ML6) - 256))) = 0)
Basic((ML16 * ((ML5 - 1) * ((ML8 + ML12) - ML1))) = 0)
Basic((ML16 * ((ML6 - 1) * (((ML9 + ML13) - ML2) + ML5))) = 0)
Basic((ML16 * ((ML7 - 1) * (((ML10 + ML14) - ML3) + ML6))) = 0)
Basic((ML16 * (ML5 * (ML5 - 1))) = 0)
Basic((ML16 * (ML6 * (ML6 - 1))) = 0)
Basic((ML16 * (ML7 * (ML7 - 1))) = 0)
Basic((ML16 * (ML16 * (ML16 - 1))) = 0)
Basic((ML16 * ((ML17 + ML18) - 1)) = 0)

-- Lookup-argument constraints
Lookup(ML16, [5, 4, 0, 0, ML8, ML9])
Lookup(ML16, [5, 4, 0, 0, ML10, ML11])
Lookup(ML16, [5, 4, 0, 0, ML12, ML13])
Lookup(ML16, [5, 4, 0, 0, ML14, ML15])
Lookup(ML16, [5, 4, 0, 0, ML1, ML2])
Lookup(ML16, [5, 4, 0, 0, ML3, ML4])
Lookup((ML17 * 2013265920), [3, 0, 0, ML0, (4 + ML0), 0, ML18, ML1, ML2, ML3, ML4, ML8, ML9, ML10, ML11, ML12, ML13, ML14, ML15, (1 + (ML16 * 2013265920)), 0, 0, 0, 0])
Lookup((ML18 * 2013265920), [3, 0, 0, ML0, (4 + ML0), 0, ML18, ML8, ML9, ML10, ML11, ML1, ML2, ML3, ML4, ML12, ML13, ML14, ML15, (1 + (ML16 * 2013265920)), 0, 0, 0, 0])

-- Permutation-argument constraints
Permutation(FirstRow(4))
Permutation(TransitionRow(4))
Permutation(LastRow(4))

‍

Looking at this output, we first observe that there are three types of constraints that can be extracted—basic, lookup, and permutation—and we proceed to discuss them in more detail.

‍

In general, constraints describe relationships between various types of symbolic variables, which are defined in SP1 here and here. These symbolic variables represent values in the BabyBear field, which is a prime field of order 2013265921 that is commonly used in zk proof systems. They describe the building blocks for constructing SP1 constraints, allowing the circuit developer to specify, for example, values in the current and next rows of both preprocessed and main trace columns, boundary conditions, public inputs/outputs, and auxiliary variables used in lookup and permutation arguments.

‍

For the purposes of this blog, the symbolic variables that are relevant for the understanding of the reasoning are MainLocal variables, denoted by ML<X>, of which the reader can think as the Xth column of the current row of the main execution trace. In addition, in the lookup and permutation constraint extraction, discussed below, we observe PermutationChallenge (PC<X>), PermutationLocal (PL<X>), PermutationNext (PN<X>), and CumulativeSum (CS<X>) symbolic variables, but as their meaning is related to the proof system only, the reader can only treat them syntactically.

‍

The way in which we extract constraints for a chip is by taking its corresponding CUDA program and then evaluating it. CUDA instructions, all of which are listed here, perform basic operations, such as addition, subtraction, and multiplication, on constants and symbolic variables via a simple register machine, with a special instruction of the form R<X> == 0 indicating that a new constraint should be extracted. For example, the following sequence of CUDA instructions

‍

R2 := ML17
R3 := 1         
R4 := R2 - R3 -- R4 := ML17 - 1
R3 := R2 * R4 -- R3 := ML17 * (ML17 - 1)
R3 == 0       -- Constraint: ML17 * (ML17 - 1) = 0
R2 := ML18
R3 := 1
R4 := R2 - R3 -- R4 := ML18 - 1
R3 := R2 * R4 -- R3 := ML18 * (ML18 - 1)
R3 == 0       -- Constraint: ML18 * (ML18 - 1) = 0

‍

results in the extraction of the first two constraints of the AddSub chip: ML17 * (ML17 - 1) = 0 and ML18 * (ML18 - 1) = 0.

‍

Basic Constraints

‍

Basic constraints are, in essence, any extracted constraints that do not feature any of the symbolic variables related to the proof system mentioned above, and they do not need to be processed further.

‍

Lookup Constraints

‍

Next come the constraints arising from the lookup arguments, which are substantially more interesting to process than the basic ones. In particular, guided by the SP1 Technical Whitepaper and the SP1 implementation, we have abstracted SP1 lookups into the following processed constraint form:

‍

Lookup(M, [C0, C1, ..., CN])

‍

where N ≥ 1, and:

‍

M is the lookup multiplicity. For the purposes of our proof-of-concept, we consider that multiplicity can only equal either 0, 1, or -1: lookups with multiplicity 0 are not performed; lookups with multiplicity1 retrieve information from other chips; and lookups with multiplicity -1 are sent in response. In a correct execution trace, the non-zero-multiplicity lookups must come in pairs, matching in number and parameters but having opposing multiplicities.
C0 is the lookup interaction kind; which describes the purpose of the lookup—for example, whether it represents an instruction execution (3, Instruction), or an operation on bytes (5, Byte), or a call to an external function (8, Syscall), etc.
C1, ..., CN are the interaction parameters, which capture, in essence, the inputs and the outputs of the lookup.

‍

For example, as we will understand in the next section, the above Byte lookup Lookup(ML16, [5, 4, 0, 0, ML1, ML2]) tells us that if ML16 equals 1 (the meaning of which is discussed shortly), then ML1 and ML2 are both less than 256 (that is, that they fit into one byte).

‍

The interesting part is that in SP1, multiple lookups can be batched together, and a single, very complex constraint is extracted per each batched lookup. Specifically, if we have \( n \) lookups with multiplicities \( [ m_i~|~i = 1, \ldots, n ] \), lookup parameters \( [ c_{0, i}, \ldots, c_{n_i, i}~|~i = 1, \ldots, n ] \), with \( n_i > 0 \) and letting:

‍

\[ L_i = c_{0,i} + \mathsf{PC}_0 + \sum_{j=1}^{n_i} c_{j, i} \cdot \mathsf{PC}_1 ^ j \]

‍

for \( i = 1, \ldots, n \), then the corresponding batched lookup constraint will be of the form

‍

\[ \left(\prod_{i=1}^n L_i\right) \cdot \mathsf{PL}_m - \sum_{i=1}^n \left(m_i \cdot \prod_{\substack{j=1, \ldots, n \\ j \neq i}} L_j \right) = 0 \]

‍

for some \( m \). Constraints such as these are identified and decoded by the process_lookup_constraints function, under the assumptions that:

‍

the permutation-challenge polynomials, \( L_i \), are of the syntactic form as given above, with degrees of \( \mathsf{PC}_1 \) in ascending order in the sum, allowing for missing degrees when the associated coefficients equal zero; and
the term \( \left(\prod_{i=1}^n L_i\right) \cdot \mathsf{PL}_m \) is on the left-hand-side of the subtraction of the batched lookup constraint.

‍

Permutation Constraints

‍

Finally, when it comes to the permutation arguments, we have observationally identified that SP1 chips produce precisely three related constraints: one for the first row of the trace; one for the transition rows; and one for the last row. These constraints are of the form:

‍

\[( \mathsf{PL}_n - \sum_{i=0}^{n-1} \mathsf{PL}_i ) \cdot \mathsf{IsFirstRow} = 0\]

\[\left(( \mathsf{PN}_n - \mathsf{PL}_n ) - \sum_{i=0}^{n-1} \mathsf{PN}_i \right) \cdot \mathsf{IsTransition} = 0\]

\[( \mathsf{PL}_n - \mathsf{CS}_0 ) \cdot \mathsf{IsLastRow} = 0\]

‍

where IsFirstRow, IsTransition, and IsLastRow denote symbolic variables dedicated to stating boundary conditions.

‍

These constraints are identified by the process_permutation_constraints function, under the assumption that they follow the exact syntactic form as given in the above equations.

‍

As the only variable in the these constraints is n, we process them into the format

‍

Permutation(FirstRow(n))
Permutation(TransitionRow(n))
Permutation(LastRow(n))

‍

thereby abstracting their shape while allowing for its recovery if needed further along the verification pipeline.

‍

From Constraints Back to Code

‍

When writing specifications of chip behaviours and when doing the associated proofs, it is helpful to be able to relate the constraints and the associated variables back to the Rust code that generates them. Luckily, this is fairly straightforward for the basic chips. For example, for the AddSub chip, the constraint-generating code can be found here (for the entire chip) and here (for the addition operation specifically).

‍

To illustrate this correspondence, let us examine the two in parallel, noting that constraints are emitted every time a function on the builder is called. The first batch that we run into is:

‍

let is_real = local.is_add + local.is_sub;
builder.assert_bool(local.is_add);
builder.assert_bool(local.is_sub);
builder.assert_bool(is_real.clone());

‍

which states that the values local.is_add, local.is_sub, and is_real all equal either 0 or 1. The only extracted constraints that match this pattern are the first three:

‍

Basic((ML17 * (ML17 - 1)) = 0)
Basic((ML18 * (ML18 - 1)) = 0)
Basic(((ML17 + ML18) * ((ML17 + ML18) - 1)) = 0)

‍

and from there we reverse-engineer that local.is_add == ML17, local.is_sub == ML18 and is_real == ML17 + ML18. The other constraints can be traced back to their source code origin relatively easily as well, noting that instruction interactions correspond to calling builder.send_instruction or builder.receive_instruction.

‍

Lastly, it is important to note that all SP1 chips come with two special Boolean flags: is_real, which indicated that the current row is enabled; andop_a_not_0, which indicates that the current row is not a padding row. For this engagement, we are only interested in cases in which is_real = 1 and op_a_not_0 = 1, as these are the cases in which all constraints are emitted. Therefore, for the AddSub chip, we require that ML17 + ML18 = 1 and ML16 = 1 (understanding from this line and the extracted instruction constraints that op_a_not_0 corresponds to ML16).

‍

Lean Verification Infrastructure

‍

With the extracted constraints in place, the next step is to import them into Lean and automatically generate as much scaffolding as possible to ease the conformance proofs to follow. The Lean infrastructure that we created as part of this proof-of-concept engagement can be found here, and is structured as follows:

‍

Sp1Poc/Wheels.lean contains auxiliary functions on Lean standard types;
Sp1Poc/Basic.lean contains basic definitions, functions, and facts about the BabyBear field and conformance theorem templating, noting that the fact that 2013265921 is prime is admitted if run on an OSX machine;
Sp1Poc/Specs.lean contains manually written specifications of chip behaviours—currently, it contains only the specification of wrap-around 32-bit addition, required for the AddSub chip, as presented in the next section;
Sp1Poc/Templater.lean generates the conformance theorem templates, described shortly;
Sp1Poc/Cli.lean defines the command-line arguments for template generation;
Sp1Poc.lean brings the previous modules together to define the main Sp1Pocmodule;
Main.lean def defines the main user-facing function; and
Examples/AddSub.mean.lean contains the proof that the constraints extracted from the AddSub chip conform to the specification of wrap-around 32-bit addition, described in detail in the next section.

‍

Assuming that an installation of Lean is present on the system, the infrastructure can be built by running lake build in the root folder, and can be used to generate a conformance theorem template by running:

‍

lake exe genTemplate path-to-constraints-file template-folder-name [-f]

‍

where:

‍

path-to-constraints-file points to a file that contains SP1 extracted constraints in the format described in the previous section;
the template-folder-name is the name of the sub-folder within the Generated folder where the generated template will be stored; and
-fis an optional argument to denote that generated files should be overwritten if they already exist.

‍

For example, assuming the AddSub.constraints file is placed into the root folder of the infrastructure repository and that it contains the constraints extracted from the AddSub chip as shown in the previous section, the corresponding conformance template can be generated by running lake exe genTemplate AddSub.constraints AddSub and will be stored in Generated/AddSub/main.lean.

‍

Conformance Template Format

‍

The conformance templates generated by our verification infrastructure have the following form:

‍

import Sp1Poc.Specs
namespace Sp1

def spec (V0 ... VN : BabyBear) : Prop := True -- Placeholder specification

set_option maxHeartbeats 5000000 in -- Placeholder computational limit
theorem conformance_theorem         -- Placeholder conformance theorem
  {V0 ... VN : BabyBear}
  [Generated hypotheses] : spec V0 ... VN := PROOF -- Placeholder proof

‍

where:

‍

spec denotes the specification to which the constraints should conform, which should be defined by the user, as illustrated in the next section for the AddSub chip. The specification is parametric on all of the variables found in the constraints, denoted by V0 ... VN, and is pre-filled with a placeholder True.
conformance_theorem denotes the actual conformance theorem, which is also parametric on V0 ... VN just like the specification, the hypotheses of which (denoted by [Generated hypotheses] and described shortly) are generated from the provided extracted constraints, and the conclusion of which is the specification. The proof is admitted using the placeholder PROOF, which denotes the Lean sorry.

‍

From Constraints to Hypotheses

‍

What remains to be discussed is how the extracted constraints are imported into Lean as hypotheses for the conformance proofs. As per the previous section, there are three types of constraints that can be extracted from SP1 zk chips: basic, lookup, and permutation constraints. Basic constraints are transferred to Lean directly as conformance theorem hypotheses. For example, the basic constraints of the AddSub chip yield the following hypotheses:

‍

(C11 : (ML17 * (ML17 - 1)) = 0)
(C12 : (ML18 * (ML18 - 1)) = 0)
(C13 : ((ML17 + ML18) * ((ML17 + ML18) - 1)) = 0)
(C14 : (ML16 * (((ML8 + ML12) - ML1) * (((ML8 + ML12) - ML1) - 256))) = 0)
(C15 : (ML16 * ((((ML9 + ML13) - ML2) + ML5) * ((((ML9 + ML13) - ML2) + ML5) - 256))) = 0)
(C16 : (ML16 * ((((ML10 + ML14) - ML3) + ML6) * ((((ML10 + ML14) - ML3) + ML6) - 256))) = 0)
(C17 : (ML16 * ((((ML11 + ML15) - ML4) + ML7) * ((((ML11 + ML15) - ML4) + ML7) - 256))) = 0)
(C18 : (ML16 * (ML5 * (((ML8 + ML12) - ML1) - 256))) = 0)
(C19 : (ML16 * (ML6 * ((((ML9 + ML13) - ML2) + ML5) - 256))) = 0)
(C20 : (ML16 * (ML7 * ((((ML10 + ML14) - ML3) + ML6) - 256))) = 0)
(C21 : (ML16 * ((ML5 - 1) * ((ML8 + ML12) - ML1))) = 0)
(C22 : (ML16 * ((ML6 - 1) * (((ML9 + ML13) - ML2) + ML5))) = 0)
(C23 : (ML16 * ((ML7 - 1) * (((ML10 + ML14) - ML3) + ML6))) = 0)
(C24 : (ML16 * (ML5 * (ML5 - 1))) = 0)
(C25 : (ML16 * (ML6 * (ML6 - 1))) = 0)
(C26 : (ML16 * (ML7 * (ML7 - 1))) = 0)
(C27 : (ML16 * (ML16 * (ML16 - 1))) = 0)
(C28 : (ML16 * ((ML17 + ML18) - 1)) = 0)

‍

and one can see that there is a clear correspondence between the two.

‍

Permutation constraints, on the other hand, do not contain any useful information for the proofs at hand, and are simply interpreted as True, yielding the following three hypotheses:

‍

(C08 : True)
(C09 : True)
(C10 : True)

‍

Finally, the most interesting and the most complex part of the hypothesis-generation process is the treatment of lookup constraints, as this is when the implicit information carried by these constraints needs to be made explicit. In particular, recall that lookup constraints are of the form

‍

Lookup(M, [IK, C1, ..., CN])

‍

where N ≥ 1, and:

‍

M is the lookup multiplicity;
IK is the lookup interaction kind; and
C1, ..., CN are the interaction parameters.

‍

If the multiplicity equals 0, then the lookup does not happen and the only basic constraint emitted is True. When it comes to interaction kinds, we partially cover instruction- and byte-related interactions, in that:

‍

for the Instruction interaction kind, which receives 23 parameters, if the multiplicity is non-zero then the emitted hypothesis tells us that all of the input (C11 - C18) and output (C7 - C10) values are bytes (that is, that they are smaller than 256); and
for the Byte interaction kind, which receives 5 parameters, we support only multiplicities 0 and 1, and the following available operations (determined by C1):
- U8Range (operation 4), the meaning of which is that C4 < 256 and C5 < 256;
- MSB (operation 7), the meaning of which is that C2 is the most significant bit of C4; and
- U16Range(operation 8), the meaning of which is that C2 < 65536.

‍

Importantly, these hypotheses cannot in general be generated together with the constraint extraction process, as multiplicity and all lookup parameters except the first can be symbolic.

To illustrate how the hypothesis generation works in practice, recall the following lookup constraint, extracted from the AddSub chip:

‍

-- U8Range check for ML8 and ML9
Lookup(ML16, [5, 4, 0, 0, ML1, ML2])

‍

Its corresponding generated hypothesis, annotated for ease of understanding, is:

‍

--U8Range check for ML8 and ML9
(C00 : if ML16 = 0 then True else           -- Multiplicity 0
       if ML16 = 1                          -- Multiplicity 1
       then (match 4 with                   -- Byte operation match (4 for this lookup)
       | 4 => 0 = 0 ∧ ML1 < 256 ∧ ML2 < 256 -- U8Range
       | 7 => ML1 < 256 ∧                   -- MSB
              (ML1 < 128 → 0 = 0) ∧
              (128 ≤ ML1 → 0 = 1)
       | 8 => 0 < 65536                     -- U16Range
       | _ => undefined) ∧ 0 = 0
       else undefined)

‍

This is an instance of a more general template:

‍

-- Lookup(M; [5, C1, C2, C3, C4, C5])
if M = 0 then True else             -- Multiplicity 0
if M = 1                            -- Multiplicity 1
then (match C1 with                 -- General byte operation match
| 4 => C2 = 0 ∧ C4 < 256 ∧ C5 < 256 -- U8Range
| 7 => C4 < 256 /\                  -- MSB
       (C4  < 128 -> C2 = 0) ∧
       (128 ≤  C4 -> C2 = 1)
| 8 => C2 < 65536                   -- U16Range
| _ => undefined) ∧ C3 = 0
else undefined)

‍

which is implemented in Sp1Poc/Templater.lean. This template can be matched to the above description of the supported multiplicities and operations, and is easily extensible.

‍

We note that, in practice, in the conformance proofs, matches observed in the above byte-related hypotheses will be either simplified immediately if the operation C1 is concrete or will be branched on otherwise.

‍

`AddSub`: Conformance Proof

‍

Let us now take a look at how to prove that the AddSub extracted constraints actually capture their intended meaning, which is wrap-around 32-bit addition. To do so, we first have to formulate in Lean what wrap-around 32-bit addition actually means.

‍

In particular, given three 32-bit integers, A, B, and C, with byte representations (from lowest to highest byte), respectively, [A1, A2, A3, A4], [B1, B2, B3, B4], and [C1, C2, C3, C4], we can specify in Lean that A is the result of 32-bit wrap-around addition of B and C as follows:

‍

def spec_32_bit_wrap_add (A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 : BabyBear) : Prop :=
  ( A1.val < 256 ) ∧ ( A2.val < 256 ) ∧ ( A3.val < 256 ) ∧ ( A4.val < 256 ) ∧
  ( B1.val < 256 ) ∧ ( B2.val < 256 ) ∧ ( B3.val < 256 ) ∧ ( B4.val < 256 ) ∧
  ( C1.val < 256 ) ∧ ( C2.val < 256 ) ∧ ( C3.val < 256 ) ∧ ( C4.val < 256 ) ∧
  ( ( B1.val + 256 * B2.val + 65536 * B3.val + 16777216 * B4.val ) +
    ( C1.val + 256 * C2.val + 65536 * C3.val + 16777216 * C4.val ) ) % 4294967296 =
  ( A1.val + 256 * A2.val + 65536 * A3.val + 16777216 * A4.val )

‍

This specification is stored in the Specs.lean file, which can be used as a library of specifications in future.

‍

Next, we match the AddSub extracted constraints to this specification, understanding that A = [ML1, ML2, ML3, ML4], B = [ML8, ML9, ML10, ML11], and C = [ML12, ML13, ML14, ML15]. Recalling that for our proof-of-concept we are only interested in the case when ML16 = 1, we now state the specification to prove as:

‍

def spec_ADD (ML0 ML1 ML2 ML3 ML4 ML5 ML6 ML7 ML8 ML9 ML10 ML11 ML12 ML13 ML14 ML15 ML16 ML17 ML18 : BabyBear ) : Prop :=
  ML16 = 1 -> 
    spec_32_bit_wrap_add ML1 ML2 ML3 ML4 ML8 ML9 ML10 ML11 ML12 ML13 ML14 ML15

‍

Finally, what remains to be done is to prove that this specification is implied by the AddSub constraints. This proof, which is available in full here, is captured by the conformance_ADD theorem below, eliding most of the constraints and providing instructive annotation:

‍

theorem conformance_ADD { ML0 ... ML18 : BabyBear }
(C00: ...)
...
(C24 : (ML16 * (ML5 * (ML5 - 1))) = 0)
(C25 : (ML16 * (ML6 * (ML6 - 1))) = 0)
(C26 : (ML16 * (ML7 * (ML7 - 1))) = 0)
...
(C28: ...) : spec_ADD ML0 ... ML18 := by
-- We unfold the specification definitions and introduce the hypotheses.
unfold spec_ADD; intro HML16; unfold spec_32_bit_wrap_add
-- We substitute the value of ML16, and simplify constraints of
-- the form X * (X - 1) = 0 into X = 0 \/ X = 1.
subst_eqs; simp [sub_eq_zero (b := (1 : BabyBear))] at *
-- We perform a case analysis on the three carries (ML5/ML6/ML7).
-- This splits the single goal into eight.
rcases C24 <;> rcases C25 <;> rcases C26 <;>
-- We substitute learned equalities on ML5, ML6, and ML7, and simplify the
-- Fin-related definitions to enforce that all of the arithmetic is in Nat.
subst_eqs <;> simp [BabyBearPrime, Fin.add_def, Fin.sub_def] at * <;>
-- We remove Fin.val for 256 to enable reasoning with omega.
simp only [Fin.lt_iff_val_lt_val] at * <;>
rw [fin_val_simp (show 256 < BabyBearPrime by linarith)] at * <;>
-- Now, the Lean omega tactic is able to discharge all of the goals.
omega

‍

Clarifications and Further Work

‍

While this work shows great promise, it is important to keep in mind the broader picture. In particular, the results that we have proven are meaningful under the assumptions that:

‍

the kernel of Lean is correct;
the SP1 lookup bus and the U8Range lookup table are correct; and
the proof system underpinning SP1 is correct.

‍

When it comes to further work, there are numerous avenues for one to explore. One immediate step would be to verify conformance of more complex chips, starting from the ALU chips and moving on to the control-flow, memory-management, and CPU-related chips. For this, the existing infrastructure would need to be upgraded to take further advantage of the modularity of lookup arguments, but this would not be a difficult task. For the precompiles, however, which output thousands of constraints (for example, the EdDecompress chip emits ~1.5K constraints), the modularity of SP1 itself would have to be improved to abstract frequently-used computations that are currently inlined (for example, the various related field operations encoded through the FieldOpCols mechanism).

‍

Beyond that, we could consider verifying correctness of SP1 lookup and permutation arguments, as well as that of the underlying proof system, but these would require both improvements to the overall Rust verification capabilities of the ecosystem and the completion of our EF grant for creating a blueprint for proving batched soundness of FRI in Lean.

‍

Taking a step back, however, there is an important question intrinsically tied to formal verification in general and interactive theorem proving in particular, and this is how to achieve proof maintainability when the system being verified (in this case, SP1) may, or is even likely to, evolve in infrastructure-breaking ways. For example, if SP1 were to switch to a different proof system, then the structure of the constraints and argument representation would also be different, and the infrastructure, and especially the proofs, would require a substantial overhaul.

‍

Fortunately, it would appear that the SP1 development by design has sufficient flexibility to output meaningful information from a number of points in the constraint generation process, and we hope to able to work with Succinct to understand what is the correct level of abstraction for achieving modular, scalable, and maintainable verification of SP1.