// inside head tag
We are excited to announce our first engagement with Succinct Labs, which has demonstrated the feasibility of formal verification of correctness of SP1 zk chips in the Lean proof assistant, and which constitutes an important initial step on the pathway towards establishing broader formal guarantees of the security of the SP1 zk infrastructure.
In particular, we have built a verification infrastructure that allows one to extract Plonky3 constraints from SP1 chips into Lean, formulate Lean specifications of chip behaviour, and prove that the extracted constraints conform to these specifications. We have then used this infrastructure to establish correctness of the SP1 AddSub
chip, proving that the associated constraints satisfy the specification of wrap-around 32-bit addition. Given the modular architecture of SP1, we believe that this approach has great potential for scalability to the most complex SP1 circuits.
This developed infrastructure is publicly available as follows:
AddSub
chip can be found here.
In the rest of the blog, we discuss these three aspects in more detail, using the AddSub
chip as a running example, and outline a number of plans for potential further work. For those without too much time on their hands, the most important technical takeaways/highlights are that:
We start by enabling the extraction of constraints from SP1 zk chips into human- and Lean-readable format. In particular, SP1 already allowed one to obtain a CUDA program that evaluates the constraint polynomial associated with a given chip. We extended this work with a mechanism for evaluating said CUDA programs and extracting Lean-readable constraints from the outputs of this evaluation. This mechanism can be found as part of this repository and is localised to the air/src/constraint_extraction
folder.
Assuming an installation of Rust is present on the system, after cloning the repository the constraints for all chips except KeccakPermute
and Global
can be extracted by running:
RUST_MIN_STACK=104857600 cargo test -- --nocapture
The extracted constraints will then be placed into the air/chip_constraints
folder, with one file generated per chip. Currently, the constraints for the KeccakPermute
chip take approximately 20 minutes to be extracted, whereas constraint extraction for the Global
chip fails, but understanding this failure was assessed to go beyond the scope of this engagement.
AddSub
: Extraction by Example
We present how our constraint extraction mechanism works by examining the following constraints, extracted from the AddSub
chip:
-- Basic constraints
Basic((ML17 * (ML17 - 1)) = 0)
Basic((ML18 * (ML18 - 1)) = 0)
Basic(((ML17 + ML18) * ((ML17 + ML18) - 1)) = 0)
Basic((ML16 * (((ML8 + ML12) - ML1) * (((ML8 + ML12) - ML1) - 256))) = 0)
Basic((ML16 * ((((ML9 + ML13) - ML2) + ML5) * ((((ML9 + ML13) - ML2) + ML5) - 256))) = 0)
Basic((ML16 * ((((ML10 + ML14) - ML3) + ML6) * ((((ML10 + ML14) - ML3) + ML6) - 256))) = 0)
Basic((ML16 * ((((ML11 + ML15) - ML4) + ML7) * ((((ML11 + ML15) - ML4) + ML7) - 256))) = 0)
Basic((ML16 * (ML5 * (((ML8 + ML12) - ML1) - 256))) = 0)
Basic((ML16 * (ML6 * ((((ML9 + ML13) - ML2) + ML5) - 256))) = 0)
Basic((ML16 * (ML7 * ((((ML10 + ML14) - ML3) + ML6) - 256))) = 0)
Basic((ML16 * ((ML5 - 1) * ((ML8 + ML12) - ML1))) = 0)
Basic((ML16 * ((ML6 - 1) * (((ML9 + ML13) - ML2) + ML5))) = 0)
Basic((ML16 * ((ML7 - 1) * (((ML10 + ML14) - ML3) + ML6))) = 0)
Basic((ML16 * (ML5 * (ML5 - 1))) = 0)
Basic((ML16 * (ML6 * (ML6 - 1))) = 0)
Basic((ML16 * (ML7 * (ML7 - 1))) = 0)
Basic((ML16 * (ML16 * (ML16 - 1))) = 0)
Basic((ML16 * ((ML17 + ML18) - 1)) = 0)
-- Lookup-argument constraints
Lookup(ML16, [5, 4, 0, 0, ML8, ML9])
Lookup(ML16, [5, 4, 0, 0, ML10, ML11])
Lookup(ML16, [5, 4, 0, 0, ML12, ML13])
Lookup(ML16, [5, 4, 0, 0, ML14, ML15])
Lookup(ML16, [5, 4, 0, 0, ML1, ML2])
Lookup(ML16, [5, 4, 0, 0, ML3, ML4])
Lookup((ML17 * 2013265920), [3, 0, 0, ML0, (4 + ML0), 0, ML18, ML1, ML2, ML3, ML4, ML8, ML9, ML10, ML11, ML12, ML13, ML14, ML15, (1 + (ML16 * 2013265920)), 0, 0, 0, 0])
Lookup((ML18 * 2013265920), [3, 0, 0, ML0, (4 + ML0), 0, ML18, ML8, ML9, ML10, ML11, ML1, ML2, ML3, ML4, ML12, ML13, ML14, ML15, (1 + (ML16 * 2013265920)), 0, 0, 0, 0])
-- Permutation-argument constraints
Permutation(FirstRow(4))
Permutation(TransitionRow(4))
Permutation(LastRow(4))
Looking at this output, we first observe that there are three types of constraints that can be extracted—basic, lookup, and permutation—and we proceed to discuss them in more detail.
In general, constraints describe relationships between various types of symbolic variables, which are defined in SP1 here and here. These symbolic variables represent values in the BabyBear
field, which is a prime field of order 2013265921 that is commonly used in zk proof systems. They describe the building blocks for constructing SP1 constraints, allowing the circuit developer to specify, for example, values in the current and next rows of both preprocessed and main trace columns, boundary conditions, public inputs/outputs, and auxiliary variables used in lookup and permutation arguments.
For the purposes of this blog, the symbolic variables that are relevant for the understanding of the reasoning are MainLocal
variables, denoted by ML<X>
, of which the reader can think as the X
th column of the current row of the main execution trace. In addition, in the lookup and permutation constraint extraction, discussed below, we observe PermutationChallenge
(PC<X>
), PermutationLocal
(PL<X>
), PermutationNext
(PN<X>
), and CumulativeSum
(CS<X>
) symbolic variables, but as their meaning is related to the proof system only, the reader can only treat them syntactically.
The way in which we extract constraints for a chip is by taking its corresponding CUDA program and then evaluating it. CUDA instructions, all of which are listed here, perform basic operations, such as addition, subtraction, and multiplication, on constants and symbolic variables via a simple register machine, with a special instruction of the form R<X> == 0
indicating that a new constraint should be extracted. For example, the following sequence of CUDA instructions
R2 := ML17
R3 := 1
R4 := R2 - R3 -- R4 := ML17 - 1
R3 := R2 * R4 -- R3 := ML17 * (ML17 - 1)
R3 == 0 -- Constraint: ML17 * (ML17 - 1) = 0
R2 := ML18
R3 := 1
R4 := R2 - R3 -- R4 := ML18 - 1
R3 := R2 * R4 -- R3 := ML18 * (ML18 - 1)
R3 == 0 -- Constraint: ML18 * (ML18 - 1) = 0
results in the extraction of the first two constraints of the AddSub
chip: ML17 * (ML17 - 1) = 0
and ML18 * (ML18 - 1) = 0
.
Basic constraints are, in essence, any extracted constraints that do not feature any of the symbolic variables related to the proof system mentioned above, and they do not need to be processed further.
Next come the constraints arising from the lookup arguments, which are substantially more interesting to process than the basic ones. In particular, guided by the SP1 Technical Whitepaper and the SP1 implementation, we have abstracted SP1 lookups into the following processed constraint form:
Lookup(M, [C0, C1, ..., CN])
where N ≥ 1
, and:
M
is the lookup multiplicity. For the purposes of our proof-of-concept, we consider that multiplicity can only equal either 0
, 1
, or -1
: lookups with multiplicity 0
are not performed; lookups with multiplicity1
retrieve information from other chips; and lookups with multiplicity -1
are sent in response. In a correct execution trace, the non-zero-multiplicity lookups must come in pairs, matching in number and parameters but having opposing multiplicities.C0
is the lookup interaction kind; which describes the purpose of the lookup—for example, whether it represents an instruction execution (3
, Instruction
), or an operation on bytes (5
, Byte
), or a call to an external function (8
, Syscall
), etc.C1, ..., CN
are the interaction parameters, which capture, in essence, the inputs and the outputs of the lookup.
For example, as we will understand in the next section, the above Byte
lookup Lookup(ML16, [5, 4, 0, 0, ML1, ML2])
tells us that if ML16
equals 1
(the meaning of which is discussed shortly), then ML1
and ML2
are both less than 256
(that is, that they fit into one byte).
Finally, when it comes to the permutation arguments, we have observationally identified that SP1 chips produce precisely three related constraints: one for the first row of the trace; one for the transition rows; and one for the last row. These constraints are of the form:
where IsFirstRow
, IsTransition
, and IsLastRow
denote symbolic variables dedicated to stating boundary conditions.
These constraints are identified by the process_permutation_constraints
function, under the assumption that they follow the exact syntactic form as given in the above equations.
As the only variable in the these constraints is n
, we process them into the format
Permutation(FirstRow(n))
Permutation(TransitionRow(n))
Permutation(LastRow(n))
thereby abstracting their shape while allowing for its recovery if needed further along the verification pipeline.
When writing specifications of chip behaviours and when doing the associated proofs, it is helpful to be able to relate the constraints and the associated variables back to the Rust code that generates them. Luckily, this is fairly straightforward for the basic chips. For example, for the AddSub
chip, the constraint-generating code can be found here (for the entire chip) and here (for the addition operation specifically).
To illustrate this correspondence, let us examine the two in parallel, noting that constraints are emitted every time a function on the builder
is called. The first batch that we run into is:
let is_real = local.is_add + local.is_sub;
builder.assert_bool(local.is_add);
builder.assert_bool(local.is_sub);
builder.assert_bool(is_real.clone());
which states that the values local.is_add, local.is_sub, and is_real all equal either 0 or 1. The only extracted constraints that match this pattern are the first three:
Basic((ML17 * (ML17 - 1)) = 0)
Basic((ML18 * (ML18 - 1)) = 0)
Basic(((ML17 + ML18) * ((ML17 + ML18) - 1)) = 0)
and from there we reverse-engineer that local.is_add == ML17
, local.is_sub == ML18
and is_real == ML17 + ML18
. The other constraints can be traced back to their source code origin relatively easily as well, noting that instruction interactions correspond to calling builder.send_instruction
or builder.receive_instruction
.
Lastly, it is important to note that all SP1 chips come with two special Boolean flags: is_real
, which indicated that the current row is enabled; andop_a_not_0
, which indicates that the current row is not a padding row. For this engagement, we are only interested in cases in which is_real = 1
and op_a_not_0 = 1
, as these are the cases in which all constraints are emitted. Therefore, for the AddSub
chip, we require that ML17 + ML18 = 1
and ML16 = 1
(understanding from this line and the extracted instruction constraints that op_a_not_0
corresponds to ML16
).
With the extracted constraints in place, the next step is to import them into Lean and automatically generate as much scaffolding as possible to ease the conformance proofs to follow. The Lean infrastructure that we created as part of this proof-of-concept engagement can be found here, and is structured as follows:
Sp1Poc/Wheels.lean
contains auxiliary functions on Lean standard types;Sp1Poc/Basic.lean
contains basic definitions, functions, and facts about the BabyBear
field and conformance theorem templating, noting that the fact that 2013265921
is prime is admitted if run on an OSX machine;Sp1Poc/Specs.lean
contains manually written specifications of chip behaviours—currently, it contains only the specification of wrap-around 32-bit addition, required for the AddSub
chip, as presented in the next section;Sp1Poc/Templater.lean
generates the conformance theorem templates, described shortly;Sp1Poc/Cli.lean
defines the command-line arguments for template generation;Sp1Poc.lean
brings the previous modules together to define the main Sp1Poc
module;Main.lean
def defines the main user-facing function; andExamples/AddSub.mean.lean
contains the proof that the constraints extracted from the AddSub
chip conform to the specification of wrap-around 32-bit addition, described in detail in the next section.
Assuming that an installation of Lean is present on the system, the infrastructure can be built by running lake build
in the root folder, and can be used to generate a conformance theorem template by running:
lake exe genTemplate path-to-constraints-file template-folder-name [-f]
where:
path-to-constraints-file
points to a file that contains SP1 extracted constraints in the format described in the previous section;template-folder-name
is the name of the sub-folder within the Generated
folder where the generated template will be stored; and-f
is an optional argument to denote that generated files should be overwritten if they already exist.
For example, assuming the AddSub.constraints
file is placed into the root folder of the infrastructure repository and that it contains the constraints extracted from the AddSub
chip as shown in the previous section, the corresponding conformance template can be generated by running lake exe genTemplate AddSub.constraints AddSub
and will be stored in Generated/AddSub/main.lean
.
The conformance templates generated by our verification infrastructure have the following form:
import Sp1Poc.Specs
namespace Sp1
def spec (V0 ... VN : BabyBear) : Prop := True -- Placeholder specification
set_option maxHeartbeats 5000000 in -- Placeholder computational limit
theorem conformance_theorem -- Placeholder conformance theorem
{V0 ... VN : BabyBear}
[Generated hypotheses] : spec V0 ... VN := PROOF -- Placeholder proof
where:
spec
denotes the specification to which the constraints should conform, which should be defined by the user, as illustrated in the next section for the AddSub
chip. The specification is parametric on all of the variables found in the constraints, denoted by V0 ... VN
, and is pre-filled with a placeholder True
.conformance_theorem
denotes the actual conformance theorem, which is also parametric on V0 ... VN
just like the specification, the hypotheses of which (denoted by [Generated hypotheses]
and described shortly) are generated from the provided extracted constraints, and the conclusion of which is the specification. The proof is admitted using the placeholder PROOF
, which denotes the Lean sorry
.
What remains to be discussed is how the extracted constraints are imported into Lean as hypotheses for the conformance proofs. As per the previous section, there are three types of constraints that can be extracted from SP1 zk chips: basic, lookup, and permutation constraints. Basic constraints are transferred to Lean directly as conformance theorem hypotheses. For example, the basic constraints of the AddSub
chip yield the following hypotheses:
(C11 : (ML17 * (ML17 - 1)) = 0)
(C12 : (ML18 * (ML18 - 1)) = 0)
(C13 : ((ML17 + ML18) * ((ML17 + ML18) - 1)) = 0)
(C14 : (ML16 * (((ML8 + ML12) - ML1) * (((ML8 + ML12) - ML1) - 256))) = 0)
(C15 : (ML16 * ((((ML9 + ML13) - ML2) + ML5) * ((((ML9 + ML13) - ML2) + ML5) - 256))) = 0)
(C16 : (ML16 * ((((ML10 + ML14) - ML3) + ML6) * ((((ML10 + ML14) - ML3) + ML6) - 256))) = 0)
(C17 : (ML16 * ((((ML11 + ML15) - ML4) + ML7) * ((((ML11 + ML15) - ML4) + ML7) - 256))) = 0)
(C18 : (ML16 * (ML5 * (((ML8 + ML12) - ML1) - 256))) = 0)
(C19 : (ML16 * (ML6 * ((((ML9 + ML13) - ML2) + ML5) - 256))) = 0)
(C20 : (ML16 * (ML7 * ((((ML10 + ML14) - ML3) + ML6) - 256))) = 0)
(C21 : (ML16 * ((ML5 - 1) * ((ML8 + ML12) - ML1))) = 0)
(C22 : (ML16 * ((ML6 - 1) * (((ML9 + ML13) - ML2) + ML5))) = 0)
(C23 : (ML16 * ((ML7 - 1) * (((ML10 + ML14) - ML3) + ML6))) = 0)
(C24 : (ML16 * (ML5 * (ML5 - 1))) = 0)
(C25 : (ML16 * (ML6 * (ML6 - 1))) = 0)
(C26 : (ML16 * (ML7 * (ML7 - 1))) = 0)
(C27 : (ML16 * (ML16 * (ML16 - 1))) = 0)
(C28 : (ML16 * ((ML17 + ML18) - 1)) = 0)
and one can see that there is a clear correspondence between the two.
Permutation constraints, on the other hand, do not contain any useful information for the proofs at hand, and are simply interpreted as True
, yielding the following three hypotheses:
(C08 : True)
(C09 : True)
(C10 : True)
Finally, the most interesting and the most complex part of the hypothesis-generation process is the treatment of lookup constraints, as this is when the implicit information carried by these constraints needs to be made explicit. In particular, recall that lookup constraints are of the form
Lookup(M, [IK, C1, ..., CN])
where N ≥ 1
, and:
M
is the lookup multiplicity;IK
is the lookup interaction kind; andC1, ..., CN
are the interaction parameters.
If the multiplicity equals 0
, then the lookup does not happen and the only basic constraint emitted is True
. When it comes to interaction kinds, we partially cover instruction- and byte-related interactions, in that:
Instruction
interaction kind, which receives 23 parameters, if the multiplicity is non-zero then the emitted hypothesis tells us that all of the input (C11 - C18
) and output (C7 - C10
) values are bytes (that is, that they are smaller than 256
); andByte
interaction kind, which receives 5 parameters, we support only multiplicities 0
and 1
, and the following available operations (determined by C1
):U8Range
(operation 4
), the meaning of which is that C4 < 256
and C5 < 256
;MSB
(operation 7
), the meaning of which is that C2
is the most significant bit of C4
; andU16Range
(operation 8
), the meaning of which is that C2 < 65536
.
Importantly, these hypotheses cannot in general be generated together with the constraint extraction process, as multiplicity and all lookup parameters except the first can be symbolic.
To illustrate how the hypothesis generation works in practice, recall the following lookup constraint, extracted from the AddSub
chip:
-- U8Range check for ML8 and ML9
Lookup(ML16, [5, 4, 0, 0, ML1, ML2])
Its corresponding generated hypothesis, annotated for ease of understanding, is:
--U8Range check for ML8 and ML9
(C00 : if ML16 = 0 then True else -- Multiplicity 0
if ML16 = 1 -- Multiplicity 1
then (match 4 with -- Byte operation match (4 for this lookup)
| 4 => 0 = 0 ∧ ML1 < 256 ∧ ML2 < 256 -- U8Range
| 7 => ML1 < 256 ∧ -- MSB
(ML1 < 128 → 0 = 0) ∧
(128 ≤ ML1 → 0 = 1)
| 8 => 0 < 65536 -- U16Range
| _ => undefined) ∧ 0 = 0
else undefined)
This is an instance of a more general template:
-- Lookup(M; [5, C1, C2, C3, C4, C5])
if M = 0 then True else -- Multiplicity 0
if M = 1 -- Multiplicity 1
then (match C1 with -- General byte operation match
| 4 => C2 = 0 ∧ C4 < 256 ∧ C5 < 256 -- U8Range
| 7 => C4 < 256 /\ -- MSB
(C4 < 128 -> C2 = 0) ∧
(128 ≤ C4 -> C2 = 1)
| 8 => C2 < 65536 -- U16Range
| _ => undefined) ∧ C3 = 0
else undefined)
which is implemented in Sp1Poc/Templater.lean
. This template can be matched to the above description of the supported multiplicities and operations, and is easily extensible.
We note that, in practice, in the conformance proofs, matches observed in the above byte-related hypotheses will be either simplified immediately if the operation C1
is concrete or will be branched on otherwise.
AddSub
: Conformance Proof
Let us now take a look at how to prove that the AddSub
extracted constraints actually capture their intended meaning, which is wrap-around 32-bit addition. To do so, we first have to formulate in Lean what wrap-around 32-bit addition actually means.
In particular, given three 32-bit integers, A
, B
, and C
, with byte representations (from lowest to highest byte), respectively, [A1, A2, A3, A4]
, [B1, B2, B3, B4]
, and [C1, C2, C3, C4]
, we can specify in Lean that A
is the result of 32-bit wrap-around addition of B
and C
as follows:
def spec_32_bit_wrap_add (A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 : BabyBear) : Prop :=
( A1.val < 256 ) ∧ ( A2.val < 256 ) ∧ ( A3.val < 256 ) ∧ ( A4.val < 256 ) ∧
( B1.val < 256 ) ∧ ( B2.val < 256 ) ∧ ( B3.val < 256 ) ∧ ( B4.val < 256 ) ∧
( C1.val < 256 ) ∧ ( C2.val < 256 ) ∧ ( C3.val < 256 ) ∧ ( C4.val < 256 ) ∧
( ( B1.val + 256 * B2.val + 65536 * B3.val + 16777216 * B4.val ) +
( C1.val + 256 * C2.val + 65536 * C3.val + 16777216 * C4.val ) ) % 4294967296 =
( A1.val + 256 * A2.val + 65536 * A3.val + 16777216 * A4.val )
This specification is stored in the Specs.lean
file, which can be used as a library of specifications in future.
Next, we match the AddSub
extracted constraints to this specification, understanding that A = [ML1, ML2, ML3, ML4]
, B = [ML8, ML9, ML10, ML11]
, and C = [ML12, ML13, ML14, ML15]
. Recalling that for our proof-of-concept we are only interested in the case when ML16 = 1
, we now state the specification to prove as:
def spec_ADD (ML0 ML1 ML2 ML3 ML4 ML5 ML6 ML7 ML8 ML9 ML10 ML11 ML12 ML13 ML14 ML15 ML16 ML17 ML18 : BabyBear ) : Prop :=
ML16 = 1 ->
spec_32_bit_wrap_add ML1 ML2 ML3 ML4 ML8 ML9 ML10 ML11 ML12 ML13 ML14 ML15
Finally, what remains to be done is to prove that this specification is implied by the AddSub constraints. This proof, which is available in full here, is captured by the conformance_ADD theorem below, eliding most of the constraints and providing instructive annotation:
theorem conformance_ADD { ML0 ... ML18 : BabyBear }
(C00: ...)
...
(C24 : (ML16 * (ML5 * (ML5 - 1))) = 0)
(C25 : (ML16 * (ML6 * (ML6 - 1))) = 0)
(C26 : (ML16 * (ML7 * (ML7 - 1))) = 0)
...
(C28: ...) : spec_ADD ML0 ... ML18 := by
-- We unfold the specification definitions and introduce the hypotheses.
unfold spec_ADD; intro HML16; unfold spec_32_bit_wrap_add
-- We substitute the value of ML16, and simplify constraints of
-- the form X * (X - 1) = 0 into X = 0 \/ X = 1.
subst_eqs; simp [sub_eq_zero (b := (1 : BabyBear))] at *
-- We perform a case analysis on the three carries (ML5/ML6/ML7).
-- This splits the single goal into eight.
rcases C24 <;> rcases C25 <;> rcases C26 <;>
-- We substitute learned equalities on ML5, ML6, and ML7, and simplify the
-- Fin-related definitions to enforce that all of the arithmetic is in Nat.
subst_eqs <;> simp [BabyBearPrime, Fin.add_def, Fin.sub_def] at * <;>
-- We remove Fin.val for 256 to enable reasoning with omega.
simp only [Fin.lt_iff_val_lt_val] at * <;>
rw [fin_val_simp (show 256 < BabyBearPrime by linarith)] at * <;>
-- Now, the Lean omega tactic is able to discharge all of the goals.
omega
While this work shows great promise, it is important to keep in mind the broader picture. In particular, the results that we have proven are meaningful under the assumptions that:
U8Range
lookup table are correct; and
When it comes to further work, there are numerous avenues for one to explore. One immediate step would be to verify conformance of more complex chips, starting from the ALU chips and moving on to the control-flow, memory-management, and CPU-related chips. For this, the existing infrastructure would need to be upgraded to take further advantage of the modularity of lookup arguments, but this would not be a difficult task. For the precompiles, however, which output thousands of constraints (for example, the EdDecompress
chip emits ~1.5K constraints), the modularity of SP1 itself would have to be improved to abstract frequently-used computations that are currently inlined (for example, the various related field operations encoded through the FieldOpCols
mechanism).
Beyond that, we could consider verifying correctness of SP1 lookup and permutation arguments, as well as that of the underlying proof system, but these would require both improvements to the overall Rust verification capabilities of the ecosystem and the completion of our EF grant for creating a blueprint for proving batched soundness of FRI in Lean.
Taking a step back, however, there is an important question intrinsically tied to formal verification in general and interactive theorem proving in particular, and this is how to achieve proof maintainability when the system being verified (in this case, SP1) may, or is even likely to, evolve in infrastructure-breaking ways. For example, if SP1 were to switch to a different proof system, then the structure of the constraints and argument representation would also be different, and the infrastructure, and especially the proofs, would require a substantial overhaul.
Fortunately, it would appear that the SP1 development by design has sufficient flexibility to output meaningful information from a number of points in the constraint generation process, and we hope to able to work with Succinct to understand what is the correct level of abstraction for achieving modular, scalable, and maintainable verification of SP1.