Jolt is a zkVM framework built around the Lasso lookup argument.
Jolt powers succinct proofs of execution of programs written in any highlevel language. Jolt's sumcheckbased SNARK makes extensive use of multivariate polynomials and commitment schemes. Jolt zkVMs have stateoftheart prover performance and have substantial room for growth over the coming decades.
Jolt zkVMs have a simple programming model, requiring only 50100 LOC to implement new VM instructions.
The Jolt codebase currently targets the RISCV instruction set which is supported by most highlevel language compilers, but the code is intended to be extensible and usable by any ISA.
The only property of the ISA that Jolt requires is that each primitive instruction is "decomposable". This means that evaluating the instruction on one or two 32bit or 64bit inputs can be done via a procedure of the following form:
 decompose each input into, say, 8bit chunks
 apply one or more specified functions to these chunks (one from each input)
 reconstruct the output of the original instruction from the outputs of the functions operating on the chunks
Related reading
Background reading
Credits
Lasso was written by Srinath Setty, Justin Thaler and Riad Wahby. Jolt was written by Arasu Arun, Srinath Setty, and Justin Thaler.
Jolt was initially forked from Srinath Setty's work on microsoft/Spartan, specifically the arkworksrs/Spartan fork in order to use the excellent Arkworksrs prime field arithmetic library. Jolt's R1CS is also proven using a version of Spartan (forked from the microsoft/Spartan2 codebase) optimized to the case of uniform R1CS constraints.
Using Jolt
Using Jolt is simple thanks to our tooling and macros. To get started, head over to our quickstart guide. For more in depth information check out the guests and hosts section.
Quickstart
Installing
Start by installing the jolt
command line tool.
cargo +nightly install git https://github.com/a16z/jolt force bins jolt
Creating a Project
To create a project, use the jolt new
command.
jolt new <PROJECT_NAME>
This will create a template Jolt project in the specified location. Make sure to cd into the project before running the next step.
cd <PROJECT_NAME>
Project Tour
The main folder contains the host, which is the code that can generate and verify proofs. Within this main folder, there is another package called guest
which contains the Rust functions that we can prove from the host. For more information about these concepts refer to the guests and hosts section.
We'll start by taking a look at our guest. We can view the guest code in guest/src/lib.rs
.
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { #[jolt::provable] fn fib(n: u32) > u128 { let mut a: u128 = 0; let mut b: u128 = 1; let mut sum: u128; for _ in 1..n { sum = a + b; a = b; b = sum; } b } }
As we can see, this implements a simple Fibonacci function called fib
. All we need to do to make our function provable is add the jolt::provable
macro above it.
Next let's take a look at the host code in src/main.rs
.
pub fn main() { let (prove_fib, verify_fib) = guest::build_fib(); let (output, proof) = prove_fib(50); let is_valid = verify_fib(proof); println!("output: {}", output); println!("valid: {}", is_valid); }
This section simply imports guest::build_fib
which is automatically generated by the jolt::provable
macro, and returns functions for proving and verifying our function. The prove function takes the same inputs as the original fib
function, but modifies the outputs to additionally return a proof. The verify function can then be used to check this proof, and return a boolean indicating its validity.
Running
Let's now run the host with cargo
.
cargo run release
This will compile the guest, perform some required preprocessing, and execute the host code which proves and verifies the 50th Fibonacci number. This preprocessing is run within the build_fib
function and adds significant time to running the host, but only needs to be performed once. This means that we could use the prove method many times without rerunning build_fib
. In the future we will support caching this across runs of the host.
Installing
Start by installing the jolt
command line tool.
cargo +nightly install git https://github.com/a16z/jolt force bins jolt
Guests and Hosts
Guests and hosts are the main concepts that we need to understand in order to use Jolt. Guests contain the functions that Jolt will prove, while hosts are where we write more general code and invoke the prover and verifier for our guest functions.
Guests
Guests contain functions for Jolt to prove. Currently, these functions must be written using no_std
Rust. If you have a function that does not require the standard library, making it provable is as easy as ensuring it is inside the guest
package and adding the jolt::provable
macro above it.
Let's take a look at a simple guest program to better understand it.
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { #[jolt::provable] fn add(x: u32, y: u32) > u32 { x + y } }
As we can see, the guest looks like a normal no_std
Rust library. The only major change is the addition of the jolt::provable
macro, which lets Jolt know of the function's existence. Other than no_std
, the only requirement of these functions is that its inputs are serializable and outputs are deserializable with serde
. Fortunately serde
is prevalent throughout the Rust ecosystem, so most types will support it by default.
There is no requirement that just a single function lives within the guest, and we are free to add as many as we need. Additionally, we can import any no_std
compatible library just as we normally would in Rust.
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { use sha2::{Sha256, Digest}; use sha3::{Keccak256, Digest}; #[jolt::provable] fn sha2(input: &[u8]) > [u8; 32] { let mut hasher = Sha256::new(); hasher.update(input); let result = hasher.finalize(); Into::<[u8; 32]>::into(result) } #[jolt::provable] fn sha3(input: &[u8]) > [u8; 32] { let mut hasher = Keccak256::new(); hasher.update(input); let result = hasher.finalize(); Into::<[u8; 32]>::into(result) } }
Hosts
Hosts are where we can invoke the Jolt prover to prove functions defined within the guest. Hosts do not have the no_std
requirement, and are free to use the Rust standard library.
The host imports the guest package, and will have automatically generated functions to build each of the Jolt functions. For the sha2 and sha3 example guest we looked at in the guest section, these functions would be called build_sha2
and build_sha3
respectively. Each returns two results, a prover function and a verifier function. The prover function takes in the same input types as the original function and modifies the output to additionally include a proof. The verifier can then take this proof and verify it.
pub fn main() { let (prove_sha2, verify_sha2) = guest::build_sha2(); let (prove_sha3, verify_sha3) = guest::build_sha3(); let input = &[5u8; 32]; let (output, proof) = prove_sha2(input); let is_valid = verify_sha2(proof); println!("sha2 output: {}", output); println!("sha2 valid: {}", is_valid); let (output, proof) = prove_sha3(input); let is_valid = verify_sha3(proof); println!("sha3 output: {}", output); println!("sha3 valid: {}", is_valid); }
Allocators
Jolt provides an allocator which supports most containers such as Vec
and Box
. This is useful for Jolt users who would like to write no_std
code rather than using Jolt's standard library support. To use these containers, they must be explicitly imported from alloc
. The alloc
crate is automatically provided by rust and does not need to added to the Cargo.toml
file.
Example
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { extern crate alloc; use alloc::vec::Vec; #[jolt::provable] fn alloc(n: u32) > u32 { let mut v = Vec::<u32>::new(); for i in 0..100 { v.push(i); } v[n as usize] } }
Standard Library
Jolt supports the full Rust standard library. Do enable support, simply add the gueststd
feature to the Jolt import in the guest's Cargo.toml
file and remove the #![cfg_attr(feature = "guest", no_std)]
directive from the guest code.
Example
#![allow(unused)] fn main() { [package] name = "guest" version = "0.1.0" edition = "2021" [[bin]] name = "guest" path = "./src/lib.rs" [features] guest = [] [dependencies] jolt = { package = "joltsdk", git = "https://github.com/a16z/jolt", features = ["gueststd"] } }
#![allow(unused)] #![no_main] fn main() { #[jolt::provable] fn int_to_string(n: i32) > String { n.to_string() } }
WASM Support
Jolt supports WebAssembly compatibility for the verification side of provable functions. This allows developers to run proof verification in WASM environments such as web browsers. Note that currently, only the verification process is supported in WASM; the proving process is not WASMcompatible at this time.
Creating a WASMCompatible Project
To create a new project with WASM verification support, use the following command:
jolt new <PROJECT_NAME> wasm
This creates a new project with the necessary structure and configuration for WASMcompatible verification.
Marking Functions for WASM Verification
For functions whose verification process should be WASMcompatible, extend the #[jolt::provable]
macro with the wasm
attribute:
#![allow(unused)] fn main() { #[jolt::provable(wasm)] fn my_wasm_verifiable_function() { // function implementation } }
Building for WASM Verification
To compile your project for WASM verification, use the following command:
jolt buildwasm
This command performs several actions:
 It extracts all functions marked with
#[jolt::provable(wasm)
from yourguest/src/lib.rs
file.  For each WASMverifiable function, it preprocesses the function and saves the necessary verification data.
 It creates an
index.html
file as an example of how to use your WASMcompiled verification functions in a web environment.  It uses wasmpack to build your project, targeting web environments.
Important: The build process only compiles the verification functions for WASM. The proving process must still be performed outside of the WASM environment.
Adding Dependencies
When adding new dependencies for WASMcompatible projects, note that they must be added to both guest/Cargo.toml and the root Cargo.toml. The buildwasm process will automatically add necessary WASMrelated dependencies to your project.
Switching Between WASM and Normal Compilation
The build process modifies the Cargo.toml
files to enable WASM compilation. If you need to switch back to normal compilation, you may need to manually adjust these files. In the root Cargo.toml
, you can remove the following lines:
[lib]
cratetype = ["cdylib"]
path = "guest/src/lib.rs"
Remember to restore these lines before attempting another WASM build.
Example Project Structure
After running jolt buildwasm
, your project will include:
 An
index.html
file in the root directory, providing a basic interface to verify proofs for each of your WASMverifiable functions.  A
pkg
directory containing the WASMcompiled version of your project's verification functions.  Preprocessed data files for each WASMverifiable function in the
target/wasm32unknownunknown/release/
directory.
You can use this example as a starting point and customize it to fit your specific requirements.
Example: Modifying the Quickstart Project
You can use the example from the Quickstart chapter and modify the /src/main.rs
file as follows:
pub fn main() { let (prove_fib, _verify_fib) = guest::build_fib(); let (_output, proof) = prove_fib(50); proof .save_to_file("proof.bin") .expect("Failed to save proof to file"); }
Remember to modify the Cargo.toml
as described before for normal compilation, and then you can generate and save the proof with the following command:
cargo run r
This will create a proof.bin
file in the root directory. You can then verify this proof using the example index.html
. Before doing this, change the crate type back to cdylib
in the Cargo.toml
and ensure that your /guest/src/lib.rs
looks like this:
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { #[jolt::provable(wasm)] fn fib(n: u32) > u128 { let mut a: u128 = 0; let mut b: u128 = 1; let mut sum: u128; for _ in 1..n { sum = a + b; a = b; b = sum; } b } }
Then run jolt buildwasm
. After successful compilation, you can verify the proof using the index.html
file. Start a local server from the project directory, for example with:
npx httpserver .
Note: Make sure you have
npx
installed to use thehttpserver
command.
Troubleshooting
Insufficient Memory or Stack Size
Jolt provides reasonable defaults for the total allocated memory and stack size. It is however possible that the defaults are not sufficient, leading to unpredictable errors within our tracer. To fix this we can try to increase these sizes. We suggest starting with the stack size first as this is much more likely to run out.
Below is an example of manually specifying both the total memory and stack size.
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { extern crate alloc; use alloc::vec::Vec; #[jolt::provable(stack_size = 10000, memory_size = 10000000)] fn waste_memory(size: u32, n: u32) { let mut v = Vec::new(); for i in 0..size { vec.push(i); } } }
Maximum Input or Output Size Exceeded
Jolt restricts the size of the inputs and outputs to 4096 bytes by default. Using inputs and outputs that exceed this size will lead to errors. These values can be configured via the macro.
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { #[jolt::provable(max_input_size = 10000, max_output_size = 10000)] fn sum(input: &[u8]) > u32 { let mut sum = 0; for value in input { sum += value; } sum } }
Guest Attempts to Compile Standard Library
Sometimes after installing the toolchain the guest still tries to compile with the standard library which will fail with a large number of errors that certain items such as Result
are referenced and not available. This generally happens when one tries to run jolt before installing the toolchain. To address, try rerunning jolt installtoolchain
, restarting your terminal, and delete both your rust target directory and any files under /tmp
that begin with jolt.
Getting Help
If none of the above help, please serialize your program and send it along with a detailed bug report.
Serializing a call to the "fib" function in the Jolt guest:
#![allow(unused)] fn main() { // let (prove_fib, verify_fib) = guest::build_fib(); let program_summary = guest::analyze_fib(10); program_summary.write_to_file("fib_10.txt".into()).expect("should write"); }
Contributors
How it works
Architecture overview
This section gives an overview of the core components of Jolt.
Jolt's four components
A VM does two things:
 Repeatedly execute the fetchdecodeexecute logic of its instruction set architecture.
 Perform reads and writes to Random Access Memory (RAM).
The Jolt paper depicts these two tasks mapped to three components in the final Jolt proof:
The Jolt codebase is similarly organized, but instead separates readwrite memory (comprising registers and RAM) from program code (aka bytecode, which is readonly), for a total of four components:
Readwrite memory
To handle reads/writes to RAM (and registers) Jolt uses a memory checking argument from Spice, which is closely related to Lasso itself. They are both based on "offline memory checking" techniques, the main difference being that Lasso supports readonly memory while Spice supports readwrite memory, making it slightly more expensive. This is implemented in read_write_memory.rs
.
For more details: Readwrite memory
R1CS constraints
To handle the "fetch" part of the fetchdecodeexecute loop, there is a minimal R1CS instance (about 60 constraints per cycle of the RISCV VM). These constraints handle program counter (PC) updates and serves as the "glue" enforcing consistency between polynomials used in the components below. Jolt uses Spartan, optimized for the highlystructured nature of the constraint system (e.g., the R1CS constraint matrices are blockdiagonal with blocks of size only about 60 x 80). This is implemented in joltcore/src/r1cs.
For more details: R1CS constraints
Instruction lookups
To handle the "execute" part of the fetchdecodeexecute loop, Jolt invokes the Lasso lookup argument. The lookup argument maps every instruction (including its operands) to its output. This is implemented in instruction_lookups.rs.
For more details: Instruction lookups
Bytecode
To handle the "decode" part of the fetchdecodeexecute loop, Jolt uses another instance of (readonly) offline memorychecking, similar to Lasso. The bytecode of the guest program is "decoded" in preprocessing, and the prover subsequently invokes offline memorychecking on the sequence of reads from this decoded bytecode corresponding to the execution trace being proven. This is implemented in bytecode.rs.
For more details: Bytecode
Instruction lookups
In Jolt, the "execute" part of the "fetchdecodeexecute" loop is handled using a lookup argument (Lasso). At a high level, the lookup queries are the instruction operands, and the lookup outputs are the instruction outputs.
Lasso
Lasso is a lookup argument (equivalent to a SNARK for reads into a readonly memory). Lookup arguments allow the prover to convince the verifier that for a committed vector of values $v$, committed vector of indices $a$, and lookup table $T$, $T[a_{i}]=v_{i}$ for all $i$.
Lasso is a special lookup argument with highly desirable asymptotic costs largely correlated to the number of lookups (the length of the vectors $a$ and $v$), rather than the length of of the table $T$.
A conversational background on lookups can be found here. In short: lookups are great for zkVMs as they allow constant cost / developer complexity for the prover algorithm per VM instruction.
A detailed engineering overview of Lasso (as a standalone lookup argument) can be found here.
Using Lasso to handle instruction execution
Lasso requires that each primitive instruction satisfies a decomposability property. The property needed is that the input(s) to the instruction can be broken into "chunks" (say, with each chunk consisting of 16 bits), such that one can obtain the answer to the original instruction by evaluating a simple function on each chunk and then "collating" the results together. For example, the bitwiseOR of two 32bit inputs x and y can be computed by breaking each input up into 8bit chunks, XORing each 8bit chunk of x with the associated chunk of y, and concatenating the results together.
In Lasso, we call the task of evaluating a simple function on each chunk a "subtable lookup" (the relevant lookup table being the table storing all $2_{16}$ evaluations of the simple function). And the "collating" of the results of the subtable lookups into the result of the original lookup (instruction execution on the unchunked inputs) is handled via an invocation of the sumcheck protocol. We call this the "primary sumcheck" instance in Lasso.
The "primary sumcheck" collation looks as follows for a trace of length $m$ and a VM with $F$ unique instructions.
$∑_{x∈{0,1}_{log(m)}}[eq (r,x)⋅∑_{f∈{0,1}_{log(F)}}flags_{f} (x)⋅g_{f}(terms_{f}(x))]$
$flags_{f} (x)$ is an indicator function for the $f$th instruction: $flags_{f} (x)=1$ if instruction $f$ is used during the $x$th step of the trace, when $x∈{0,1}_{log_{2}(m)}$.
$g_{f}(...)$ is the collation function used by the $f$th instruction.
$terms_{f}(x)=[E_{1}(x),...E_{α}(x)]$ where $α$ is the number of independent memories used by an instruction. For simple instructions like the EQ instruction, $α=C$, where $C$ is the number of chunks that the inputs are broken into, so $terms_{f}(x)=[E_{1}(x),...E_{C}(x)]$. More complicated instructions such LT might have $terms_{f}(x)=[E_{EQ}(x),E_{LT1}(x),E_{LT2}(x)]$. The exact layout is dependent on the number of subtables required by the decomposition of the instruction. The mappings can be found in the JoltInstruction::subtable
method implementations.
Mental Model
For a given $r∈{0,1}_{log(m)}$ (think integer index of the instruction within the trace), $eq (r,x)=0$ for all but one term $x$ of the outer sum. Similarly all $flags_{f} (x)=0$ for all but one term of the inner sum. Leaving just the collation function of a single instruction, evaluating to the collated lookup output of the single instruction. In reality $r$ is a random point $r∈F_{log(m)}$ selected by the verifier over the course of the protocol. The evaluation point provides a distance amplified encoding of the entire trace of instructions.
To illustrate more concretely imagine a twoinstruction VM for LT and EQ instructions with $C=1$.
$x∈{0,1}_{log(m)}∑ eq (r,x)⋅[flags _{LT}(x)⋅g_{LT}(E_{LT}(x))+flags _{EQ}(x)⋅g_{EQ}(E_{EQ}(x))]$
Subtable Flags
 We then use memory checking to determine that each of the memories $E_{i}$ is well formed
 At a given step of the CPU only a single instruction will be used, that means that only that instruction's subtables will be used. For the rest of the memories we insert a no_op with (a, v) = 0. In order to make the GKR trees cheaper to compute and sumcheck we'll add a single additional layer to the GKR tree. During this layer we'll "toggle" each of the GKR leaves to "1" in the case that it is an unused step of the CPU. This will make the binary tree of multiplication gates cheaper. We'll toggle based on a new flags polynomial called $subtable−flags_{f}$ which is the sum of all of the $instruction−flags_{f}$ used in the instruction collation above.
 The function to compute each of the leaves becomes $leaf[i]=subtableflags[i]⋅fingerprint[i]+(1−subtableflags[i])$
See the documentation on how Jolt leverages sparseconstraintsystems for further details on subtagle flags and how they are used in Jolt.
Readwrite memory (registers and RAM)
Jolt proves the validity of registers and RAM using offline memory checking. In contrast to our usage of offline memory checking in other modules, registers and RAM are writable memory.
Memory layout
For the purpose of offline memory checking, Jolt treats registers, program inputs/outputs, and RAM as occupying one unified address space. This remapped address space is laid out as follows:
The zeropadding depicted above is sized so that RAM starts at a poweroftwo offset (this is explained below). As noted in the diagram, the size of the witness scales with the highest memory addressed over the course of the program's execution. In addition to the zeropadding between the "Program I/O" and "RAM" sections, the end of the witness is zeropadded to a power of two.
Handling program I/O
Inputs
Program inputs and outputs (and the panic bit, which indicates whether the proram panicked) live in the same memory address space as RAM. Program inputs populate the designated input space upon initialization:
The verifier can efficiently compute the MLE of this initial memory state on its own (i.e. in time proportional to the IO size, not the total memory size).
Ouputs and panic
On the other hand, the verifier cannot compute the MLE of the final memory state on its own –– though the program I/O is known to the verifier, the final memory state contains values written to registers/RAM over the course of program execution, which are not known to the verifier.
The verifier is, however, able to compute the MLE of the program I/O values (padded on both sides with zeros) –– this is denoted v_io
below.
If the prover is honest, then the final memory state (v_final
below) should agree with v_io
at the indices corresponding to program I/O.
To enforce this, we invoke the sumcheck protocol to perform a "zerocheck" on the difference between v_final
and v_io
:
This also motivates the zeropadding between the "Program I/O" and "RAM" sections of the witness. The zeropadding ensures that both input_start
and ram_witness_offset
are powers of two, which makes it easier for the verifier to compute the MLEs of v_init
and v_io
.
Timestamp range check
Registers and RAM are writable memory, which introduces additional requirements compared to offline memory checking in a readonly context.
The multiset equality check for readonly memory, typically expressed as $I⋅W=R⋅F$, is not adequate for ensuring the accuracy of read values. It is essential to also verify that each read operation retrieves a value that was written in a previous step (not a future step). (Here, $I$ denotes the tuples capturing initialization of memory and $W$ the tuples capturing all of the writes to memory following initialization. $R$ denotes the tuples capturing all read operations, and $F$ denotes tuples capturing a final read pass over all memory cells).
To formalize this, we assert that the timestamp of each read operation, denoted as $read_timestamp$, must not exceed the global timestamp at that particular step. The global timestamp starts at 0 and is incremented once per step.
The verification of $read_timestamp≤global_timestamp$ is equivalent to confirming that $read_timestamp$ falls within the range $[0,TRACE_LENGTH)$ and that the difference $(global_timestamp−read_timestamp)$ is also within the same range.
The process of ensuring that both $read_timestamp$ and $(global_timestamp−read_timestamp)$ lie within the specified range is known as rangechecking. This is the procedure implemented in timestamp_range_check.rs
, using a modified version of Lasso.
Intuitively, checking that each read timestamp does not exceed the global timestamp prevents an attacker from answering all read operations to a given cell with "the right set of values, but out of order". Such an attack requires the attacker to "jump forward and backward in time". That is, for this attack to succeed, at some timestamp t when the cell is read, the attacker would have to return a value that will be written to that cell in the future (and at some later timestamp t' when the same cell is read the attacker would have to return a value that was written to that cell much earlier). This attack is prevented by confirming that all values returned have a timestamp that does not exceed the current global timestamp.
Bytecode
Jolt proves the validity of registers and RAM using offline memory checking.
Decoding an ELF file
The tracer iterates over the .text
sections of the program ELF file and decode RISCV instructions. Each instruction gets mapped to a BytecodeRow
struct:
#![allow(unused)] fn main() { pub struct BytecodeRow { /// Memory address as read from the ELF. address: usize, /// Packed instruction/circuit flags, used for r1cs bitflags: u64, /// Index of the destination register for this instruction (0 if register is unused). rd: u64, /// Index of the first source register for this instruction (0 if register is unused). rs1: u64, /// Index of the second source register for this instruction (0 if register is unused). rs2: u64, /// "Immediate" value for this instruction (0 if unused). imm: u64, } }
The registers (rd
, rs1
, rs2
) and imm
are described in the RISCV spec.
The bitflags
field is described in greater detail below.
The preprocessed bytecode serves as the (readonly) "memory" over which we perform offline memory checking.
We currently assume the program bytecode is sufficiently short and known by the verifier, so the prover doesn't need to provide a commitment to it –– the verifier can compute the MLE of the bytecode on its own. This guarantees that the bytecode trace is consistent with the agreedupon program.
Instead of $(a,v,t)$ tuples, each step in the bytecode trace is mapped to a tuple $(a,v_{bitflags},v_{rd},v_{rs1},v_{rs2},v_{imm},t)$. Otherwise, the offline memory checking proceeds as described in Offline Memory Checking.
Bitflags
The bitflags
of a given instruction is the concatenation of its circuit flags and instruction flags.
This concatenation is enforced by R1CS constraints.
R1CS constraints
Jolt usees R1CS constraints to enforce certain rules of the RISCV fetchdecodeexecute loop and to ensure consistency between the proofs for the different modules of Jolt (instruction lookups, readwrite memory, and bytecode).
Uniformity
Jolt's R1CS is uniform, which means that the constraint matrices for an entire program are just repeated copies of the constraint matrices for a single CPU step. Each step is conceptually simple and involves around 60 constraints and 80 variables.
Input Variables and constraints
The inputs required for the constraint system for a single CPU step are:
Pertaining to bytecode
 Program counters (PCs): this is the only state passed between CPU steps.
 Bytecode read address: the address in the program code read at this step.
 The preprocessed ("5tuple") representation of the instruction: (
bitflags
,rs1
,rs2
,rd
,imm
).
Pertaining to readwrite memory
 The (starting) RAM address read by the instruction: if the instruction is not a load/store, this is 0.
 The bytes written to or read from memory.
Pertaining to instruction lookups
 The chunks of the instruction's operands
x
andy
.  The chunks of the lookup query. These are typically some combination of the operand chunks (e.g. the ith chunk of the lookup query is often the concatenation of
x_i
andy_i
).  The lookup output.
Circuit and instruction flags:
 There are nine circuit flags used to guide the constraints and are dependent only on the opcode of the instruction. These are thus stored as part of the preprocessed bytecode in Jolt.
operand_x_flag
: 0 if the first operand is the value inrs1
or thePC
.operand_y_flag
: 0 if the second operand is the value inrs2
or theimm
.is_load_instr
is_store_instr
is_jump_instr
is_branch_instr
if_update_rd_with_lookup_output
: 1 if the lookup output is to be stored inrd
at the end of the step.sign_imm_flag
: used in load/store and branch instructions where the instruction is added as constraints.is_concat
: indicates whether the instruction performs a concattype lookup.
 Instruction flags: these are the unary bits used to indicate instruction is executed at a given step. There are as many per step as the number of unique instruction lookup tables in Jolt, which is 19.
Constraint system
The constraints for a CPU step are detailed in the get_jolt_matrices()
function in constraints.rs
.
Reusing commitments
As with most SNARK backends, Spartan requires computing a commitment to the inputs
to the constraint system.
A catch (and an optimization feature) in Jolt is that most of the inputs
are also used as inputs to proofs in the other modules. For example,
the address and values pertaining to the bytecode are used in the bytecode memorychecking proof,
and the lookup chunks, output and flags are used in the instruction lookup proof.
For Jolt to be sound, it must be ensured that the same inputs are fed to all relevant proofs.
We do this by reusing the commitments themselves.
This can be seen in the format_commitments()
function in the r1cs/snark
module.
Spartan is adapted to take precommitted witness variables.
Exploiting uniformity
The uniformity of the constraint system allows us to heavily optimize both the prover and verifier. The main changes involved in making this happen are:
 Spartan is modified to only take in the constraint matrices a single step, and the total number of steps. Using this, the prover and verifier can efficiently calculate the multilinear extensions of the full R1CS matrices.
 The commitment format of the witness values is changed to reflect uniformity. All versions of a variable corresponding to each time step is committed together. This affects nearly all variables committed to in Jolt.
 The inputs and witnesses are provided to the constraint system as segments.
 Additional constraints are used to enforce consistency of the state transferred between CPU steps.
These changes and their impact on the code are visible in spartan.rs
.
M extension
Jolt supports the RISCV "M" extension for integer multiplication and division.
The instructions included in this extension are described here.
For RV32, the M extension includes 8 instructions: MUL
, MULH
, MULHSU
, MULU
, DIV
, DIVU
, REM
, and REMU
.
The Jolt paper describes how to handle the M extension instructions in Section 6, but our implementation deviates from the paper in a couple ways (described below).
Virtual sequences
Section 6.1 of the Jolt paper introduces virtual instructions and registers –– some of the M extension
instructions cannot be implemented as a single subtable decomposition, but rather must be split into
a sequence of instructions which together compute the output and places it in the destination register.
In our implementation, these sequences are captured by the VirtualInstructionSequence
trait.
The instructions that comprise such a sequence can be a combination of "real" RISCV instructions and "virtual" instructions which only appear in the context of virtual sequences. We also introduce 32 virtual registers as "scratch space" where instructions in a virtual sequence can write intermediate values.
Deviations from the Jolt paper
There are three inconsistencies between the virtual sequences provided in Section 6.3 of the Jolt paper, and the RISCV specification. Namely:
 The Jolt prover (as described in the paper) would fail to produce a valid proof
if it encountered a division by zero; since the divisor
y
is 0, theASSERT_LTU
/ASSERT_LT_ABS
would always fail (forDIVU
andDIV
, respectively).  The MLE provided for
ASSERT_LT_ABS
in Section 6.1.1 doesn't account for two's complement.  The
ASSERT_EQ_SIGNS
instruction should always return true if the remainder is 0.
To address these issues, our implementation of DIVU
, DIV
, REMU
, and REM
deviate from the
Jolt paper in the following ways.
DIVU
virtual sequence
ADVICE
, , , $v_{q}$// store nondeterministic advice
$q$into
$v_{q}$ADVICE
, , , $v_{r}$// store nondeterministic advice
$r$into
$v_{r}$MUL
$v_{q}$, $r_{y}$, , $v_{qy}$// compute q * y
ASSERT_VALID_UNSIGNED_REMAINDER
$v_{r}$, $r_{y}$, , // assert that y == 0  r < y
ASSERT_LTE
$v_{qy}$, $r_{x}$, , // assert q * y <= x
ASSERT_VALID_DIV0
$r_{y}$, $v_{q}$, , // assert that y != 0  q == 2 ** WORD_SIZE  1
ADD
$v_{qy}$, $v_{r}$, , $v_{0}$// compute q * y + r
ASSERT_EQ
$v_{0}$, $x$, , MOVE
$v_{q}$, , ,rd
REMU
virtual sequence
ADVICE
, , , $v_{q}$// store nondeterministic advice
$q$into
$v_{q}$ADVICE
, , , $v_{r}$// store nondeterministic advice
$r$into
$v_{r}$MUL
$v_{q}$, $r_{y}$, , $v_{qy}$// compute q * y
ASSERT_VALID_UNSIGNED_REMAINDER
$v_{r}$, $r_{y}$, , // assert that y == 0  r < y
ASSERT_LTE
$v_{qy}$, $r_{x}$, , // assert q * y <= x
ADD
$v_{qy}$, $v_{r}$, , $v_{0}$// compute q * y + r
ASSERT_EQ
$v_{0}$, $x$, , MOVE
$v_{r}$, , ,rd
DIV
virtual sequence
ADVICE
, , , $v_{q}$// store nondeterministic advice
$q$into
$v_{q}$ADVICE
, , , $v_{r}$// store nondeterministic advice
$r$into
$v_{r}$ASSERT_VALID_SIGNED_REMAINDER
$v_{r}$, $r_{y}$, , // assert that r == 0  y == 0  (r < y && sign(r) == sign(y))
ASSERT_VALID_DIV0
$r_{y}$, $v_{q}$, , // assert that y != 0  q == 2 ** WORD_SIZE  1
MUL
$v_{q}$, $r_{y}$, , $v_{qy}$// compute q * y
ADD
$v_{qy}$, $v_{r}$, , $v_{0}$// compute q * y + r
ASSERT_EQ
$v_{0}$, $x$, , MOVE
$v_{q}$, , ,rd
REM
virtual sequence
ADVICE
, , , $v_{q}$// store nondeterministic advice
$q$into
$v_{q}$ADVICE
, , , $v_{r}$// store nondeterministic advice
$r$into
$v_{r}$ASSERT_VALID_SIGNED_REMAINDER
$v_{r}$, $r_{y}$, , // assert that r == 0  y == 0  (r < y && sign(r) == sign(y))
MUL
$v_{q}$, $r_{y}$, , $v_{qy}$// compute q * y
ADD
$v_{qy}$, $v_{r}$, , $v_{0}$// compute q * y + r
ASSERT_EQ
$v_{0}$, $x$, , MOVE
$v_{r}$, , ,rd
R1CS constraints
Ciruict flags
With the M extension we introduce the following circuit flags:
is_virtual
: Is this instruction part of a virtual sequence?is_assert
: Is this instruction anASSERT_*
instruction?do_not_update_pc
: If this instruction is virtual and not the last one in its sequence, then we should not update the PC. This is because all instructions in virtual sequences are mapped to the same ELF address.
Uniform constraints
The following constraints are enforced for every step of the execution trace:
 If the instruction is a
MUL
,MULU
, orMULHU
, the lookup query is the product of the two operandsx * y
(field multiplication of two 32bit values).  If the instruction is a
MOV
orMOVSIGN
, the lookup query is a single operandx
(read from the first source registerrs1
).  If the instruction is an assert, the lookup output must be true.
Program counter constraints
Each instruction in the preprocessed bytecode contains its (compressed) memory address as given by the ELF file. This is used to compute the expected program counter for each step in the program trace.
If the do_not_update_pc
flag is set, we constrain the next PC value to be equal to the current one.
This handles the fact that all instructions in virtual sequences are mapped to the same ELF address.
This also means we need some other mechanism to ensure that virtual sequences are executed in order and in full. If the current instruction is virtual, we can constrain the next instruction in the trace to be the next instruction in the bytecode. We observe that the virtual sequences used in the M extension don't involve jumps or branches, so this should always hold, except if we encounter a virtual instruction followed by a padding instruction. But that should never happend because an execution trace should always end with some return handling, which shouldn't involve a virtual sequence.
The design of Jolt (the built system) leans heavily on a simple but powerful technique that is not described in the Jolt paper.
Jolt uses the technique to achieve what is sometimes called "a la carte" prover costs: for each cycle of the VM, the Jolt prover only "pays" for the primitive instruction that was actually executed at that cycle, regardless of how many instructions are in the Instruction Set Architecture (in Jolt's case, RISCV). This design will allow Jolt to be extended to richer instruction sets (which is equivalent to supporting different precompiles) without increases in percycle prover cost.
The technique is already incorporated into Jolt in the context of lookups, but it's simplest to explain in the context of precompiles. Here is the rough idea. Suppose a precompile is implemented via some constraint system (say, R1CS for concreteness). As a first attempt at supporting the precompile, we can include into Jolt a single, dataparallel constraint system that "assumes" that the precompile is executed at every single cycle that the VM ran for. There are two problems with this. First, the precompile will not actually be executed at every cycle, so we need a way to "turn off" all the constraints for every cycle at which the precompile is not executed. Second, we do not want the prover to "pay" in runtime for the constraints that are turned off.
To address both of these issues, for each cycle $j$ we have the prover commit to a binary flag $b_{j}$ indicating whether or not that precompile was actually executed at that cycle.
For each cycle $j$ consider the $i$'th constraint that is part of the precompile, say: $⟨a_{i},z⟩⋅⟨b_{i},z⟩−⟨c_{i},z⟩=0.$
Here, one should think of $z$ as an execution trace for the VM, i.e., a list of everything that happened at each and every cycle of VM. In Jolt, there are under 100 entries of $z$ per cycle (though as we add precompiles this number might grow).
We modify each constraint by multiplying the left hand side by the binary flag $b_{j}$, i.e., $b_{j}⋅(⟨a_{i},z⟩⋅⟨b_{i},z⟩−⟨c_{i},z⟩)=0.$
Now, for any cycle where this precompile is not executed, the prover can simply assign any variables in $z$ that are only "used" by the precompile to 0. This makes these values totally "free" to commit to when using any commitment scheme based on elliptic curves (after we switch the commitment scheme to Binius, committing to 0s will no longer be "free", but we still expect the commitment costs to be so low that it won't matter a huge amount). These variables might not satisfy the original constraint but will satisfy the modified one simply because $b_{j}$ will be set to 0. We say that $b_{j}$ "turned off'' the constraint.
Moreover, in sumcheckbased SNARKs for R1CS like Spartan, there are prover algorithms dating to CTY11 and CMT12 where the prover's runtime grows only with the number of constraints where the associated flag $b_{j}$ is not zero. So the prover effectively pays nothing at all (no commitment costs, no field work) for constraints at any cycle where the precompile is not actually executed. We call these algorithms "streaming/sparse sumcheck proving." In a happy coincidence, these are the same algorithms we expect to use to achieve a streaming prover (i.e., to control prover space without recursion).
More precisely, the prover time in these algorithms is about $m$ field operations per round of sumcheck, where $m$ is the number of constraints with associated binary flag $b_{j}$ not equal to $0$. But if $n$ is the total number of constraints (including those that are turned off), we can "switch over" to the standard "dense" lineartime sumcheck proving algorithm after enough rounds $i$ pass so that $n/2_{i}≈m$. In Jolt, we expect this "switchover" to happen by round $4$ or $5$. In the end, the amount of extra field work done by the prover owing to the sparsity will only be a factor of $2$ or so.
Jolt uses this approach within Lasso as well. Across all of the primtive RISCV instructions, there are about 80 "subtables" that get used. Any particular primitive instruction only needs to access between 4 and 10 of these subtables. We "pretend" that every primitive instruction actually accesses all 80 of the subtables, but use binary flags to "turn off" any subtable lookups that do not actually occur (i.e., that is, we tell the grand product in the guts of the Lasso lookup argument for that subtable to ``ignore'' that lookup). This is why about 93% of the factors multiplied in Jolt's various grand product arguments are 1. The same algorithmic approach ensures that the "turned off lookups" do not increase commitment time, and that our grand product prover does not pay any field work for the "turned off" subtable lookups.
There are alternative approaches we could take to achieve "a la carte" prover costs, e.g., vRAM's approach of having the prover sort all cycles by which primitive operation or precompile was executed at that cycle (see also the much more recent work Ceno). But the above approach is compatable with a streaming prover, avoids committing to the same data multiple times, and has other benefits.
We call this technique (fast proving for) "sparse constraint systems". Note that the term sparse here does not refer to there being the sparsity of the R1CS constraint matrices themselves, but rather to almost all of the left hand sides of the constraints being $0$ when the constraints are evaluated on the valid witness vector $z$.
Background
If you're interested in learning the nitty gritty details of SNARKs including questions like 'why', 'is it sound', you should read Proofs and Args ZK by Thaler. If you're simply interested in 'how', this section is for you.
The Sumcheck Protocol
Adapted from the Thaler13 exposition. For a detailed introduction to sumcheck see Chapter 4.1 of the textbook.
Suppose we are given a $v$variate polynomial $g$ defined over a finite field $F$. The purpose of the sumcheck protocol is to compute the sum:
$H:=b_{1}∈{0,1}∑ b_{2}∈{0,1}∑ ⋯b_{v}∈{0,1}∑ g(b_{1},…,b_{v}).$
In order to execute the protocol, the verifier needs to be able to evaluate $g(r_{1},…,r_{v})$ for a randomly chosen vector $(r_{1},…,r_{v})∈F_{v}$. Hence, from the verifier's perspective, the sumcheck protocol reduces the task of summing $g$'s evaluations over $2_{v}$ inputs (namely, all inputs in ${0,1}_{v}$) to the task of evaluating $g$ at a single input in $(r_{1},…,r_{v})∈F_{v}$.
The protocol proceeds in $v$ rounds as follows. In the first round, the prover sends a polynomial $g_{1}(X_{1})$, and claims that
$g_{1}(X_{1})=x_{2},…,x_{v}∈{0,1}_{v−1}∑ g(X_{1},x_{2},…,x_{v}).$
Observe that if $g_{1}$ is as claimed, then $H=g_{1}(0)+g_{1}(1)$. Also observe that the polynomial $g_{1}(X_{1})$ has degree $deg_{1}(g)$, the degree of variable $x_{1}$ in $g$. Hence $g_{1}$ can be specified with $deg_{1}(g)+1$ field elements. In our implementation, $P$ will specify $g$ by sending the evaluation of $g_{1}$ at each point in the set ${0,1,…,deg_{1}(g)}$. (Actually, the prover does not need to send $g_{1}(1)$, since since the verifier can infer that $g_{1}(1)=H−g_{1}(0)$, as if this were not the case, the verifier would reject).
Then, in round $j>1$, $V$ chooses a value $r_{j−1}$ uniformly at random from $F$ and sends $r_{j−1}$ to $P$. We will often refer to this step by saying that variable $j−1$ gets bound to value $r_{j−1}$. In return, the prover sends a polynomial $g_{j}(X_{j})$, and claims that
$g_{j}(X_{j})=(x_{j+1},…,x_{v})∈{0,1}_{v−j}∑ g(r_{1},…,r_{j−1},X_{j},x_{j+1},…,x_{v}).(1)$
The verifier compares the two most recent polynomials by checking that $g_{j−1}(r_{j−1})=g_{j}(0)+g_{j}(1)$, and rejecting otherwise. The verifier also rejects if the degree of $g_{j}$ is too high: each $g_{j}$ should have degree $deg_{j}(g)$, the degree of variable $x_{j}$ in $g$.
In the final round, the prover has sent $g_{v}(X_{v})$ which is claimed to be $g(r_{1},…,r_{v−1},X_{v})$. $V$ now checks that $g_{v}(r_{v})=g(r_{1},…,r_{v})$ (recall that we assumed $V$ can evaluate $g$ at this point). If this test succeeds, and so do all previous tests, then the verifier accepts, and is convinced that $H=g_{1}(0)+g_{1}(1)$.
Multilinear Extensions
For any $v$variate polynomial $g(x_{1},...x_{v})$ polynomial, it's multilinear extension $f(x_{1},...x_{v})$ is the polynomial which agrees over all $2_{v}$ points $x∈{0,1}_{v}$: $g(x_{1},...x_{v})=f~ (x_{1},...x_{v})∀x∈{0,1}_{v}$. By the SchwartzZippel lemma, if $g$ and $f$ disagree at even a single input, then $g$ and $f$ must disagree almost everywhere.
For more precise details please read Section 3.5 of Proofs and Args ZK.
Engineering
In practice, MLE's are stored as the vector of evaluations over the $v$variate boolean hypercube ${0,1}_{v}$. There are two important algorithms over multilinear extensions: single variable binding, and evaluation.
Single Variable Binding
With a single streaming pass over all $n$ evaluations we can "bind" a single variable of the $v$variate multilinear extension to a point $r$. This is a critical subalgorithm in sumcheck. During the binding the number of evaluation points used to represent the MLE gets reduced by a factor of 2:
 Before: $f~ (x_{1},...x_{v}):{0,1}_{v}→F$
 After: $f_{′}~ (x_{1},...x_{v−1})=f~ (r,x_{1},...x_{v−1}):{0,1}_{v−1}→F$
Assuming your MLE is represented as a 1D vector of $2_{v}$ evaluations $E$ over the $v$variate boolean hypercube ${0,1}_{v}$, indexed littleendian
 $E[1]=f~ (0,0,0,1)$
 $E[5]=f~ (0,1,0,1)$
 $E[8]=f~ (1,0,0,0)$
Then we can transform the vector $E→E_{′}$ representing the transformation from $f~ (x_{1},...x_{v})→f_{′}~ (r,x_{1},...x_{v−1})$ by "binding" the evaluations vector to a point $r$.
let n = 2^v;
let half = n / 2;
for i in 0..half {
let low = E[i];
let high = E[half + i];
E[i] = low + r * (high  low);
}
Multi Variable Binding
Another common algorithm is to take the MLE $f~ (x_{1},...x_{v})$ and compute its evaluation at a single $v$variate point outside the boolean hypercube $x∈F_{v}$. This algorithm can be performed in $O(n)$ time by preforming the single variable binding algorithm $g(n)$ times. The time spent on $i$'th variable binding is $O(n/2_{i})$, so the total time across all $gn$ bindings is proportional to $∑_{i=1}n/2_{i}=O(n)$.
Eq Extension
The $eq $ multilinear extension (MLE) is useful throughout the protocol. It is the MLE of the function $eq:{0,1}_{m}×{0,1}_{m}→F$ defined as follows:
$eq(r,x)={10 ifr=xotherwise $
$eq $ has the following explicit formula: $eq (r,x)=i=1∏log(m) (r_{i}⋅x_{i}+(1−r_{i})⋅(1−x_{i}))wherer,x∈{0,1}_{log(m)}$
Batched Openings
At the end of the sumcheck protocol, the verifier must evaluate many multilinear polynomials at a single point. For polynomials which the verifier cannot compute on their own, they depend on the verification of a PCS opening proof provided by the prover. To save on verifier costs, all polynomials opened at the same point can be combined to a single opening proof.
The best reading on the subject can be found in Section 16.1 of the Textbook.
Offline Memory Checking
Offline memory checking is a method that enables a prover to demonstrate to a verifier that a read/write memory was used correctly. In such a memory system, values $v$ can be written to addresses $a$ and subsequently retrieved. This technique allows the verifier to efficiently confirm that the prover adhered to the memory's rules (i.e., that the value returned by any read operation is indeed the most recent value that was written to that memory cell).
Jolt utilizes offline memory checking in the Bytecode prover, Lookup prover (for VM instruction execution), and RAM prover. The RAM prover must support a readwrite memory. The Bytecode prover and Lookup prover need only support readonly memories. In the case of the Bytecode prover, the memory is initialized to contain the Bytecode of the RISCV program, and the memory is never modified (it's readonly). And the lookup tables used for VM instruction execution are determined entirely by the RISCV instruction set.
(The term "offline memory checking" refers to techniques that check the correctness of all read operations "all at once", after the reads have all occurredor in SNARK settings, after the purported values returned by the reads have been committed. Offline checking techniques do not determine as a read happens whether or not it was correct. They only ascertain, when all the reads are checked at once, whether or not all of the reads were correct.
This is in contrast to "online memory checking" techniques like Merkle hashing that immediately confirm that a memory read was done correctly by insisting that each read includes an authentication path. Merkle hashing is much more expensive on a perread basis for SNARK provers, and offline memory checking suffices for SNARK design. This is why Lasso and Jolt use offline memory checking techniques rather than online).
Initialization Algorithm
TODO
:
 Initialize four timestamp counters
 Implicitly assume a read operation before each write
Multiset Check
Define $read$ and $write$ as subsets, and $init$ and $final$ as subsets: $read,write⊆{(a_{i},v_{i},t_{i})∣i∈[0,m]}$ $init,final⊆{(a_{i},v_{i},t_{i})∣i∈[0,M]}$ Here, $a_{i}$, $v_{i}$, and $t_{i}$ represent the address, value, and timestamp respectively, with $m$ being the total number of memory operations and $M$ the size of the RAM.
The verifier checks that the combination of $read$ and $final$ matches $write$ and $init$, disregarding the sequence of elements, known as a permutation check: $read∪final=write∪init$ Jolt conducts this check using a homomorphic hash function applied to each set: $H(read)=(a_{i},v_{i},t_{i})∈read∏ h(a_{i},v_{i},t_{i})$ $H(write)=(a_{i},v_{i},t_{i})∈write∏ h(a_{i},v_{i},t_{i})$ $H(init)=(a_{j},v_{j},t_{j})∈init∏ h(a_{j},v_{j},t_{j})$ $H(final)=(a_{j},v_{j},t_{j})∈final∏ h(a_{j},v_{j},t_{j})$ The hash function $h$ is defined as: $h_{γ,τ}(a,v,t)=a⋅γ_{2}+v⋅γ+t−τ$
This multiset hashing process is represented by a binary tree of multiplication gates and is computed using an optimized GKR protocol.
References
 Original BEGKN paper on offline memory checking, forming the technical underpinnings of Lasso
 Spice Protocol for readwrite memories (see Fig. 2)
 Spartan Protocol
 Lasso Protocol for readonly memories a.k.a. lookups
 Thaler13 Grand Product Protocol (see Prop. 2)
 Quarks Grand Product Protocol (see Section 6)
GKR
GKR is a SNARK protocol for smalldepth circuits of multiplication and addition gates. The standard form allows combinations of both. The protocol involves three MLEs: $V_{i}$ is the MLE of a function capturing the values of the gates at layer $i$, and $add_{i}$ and $mult_{i}$ are MLEs of functions capturing the circuit's wires.
Specifically, $V_{i}(j)$ evaluates to the value of the circuit at the $i$th layer in the $j$th gate. For example $V_{1}(0)$ corresponds to the output gate.
$add_{i}(j,a,b)$ evaluates to 1 if the $j$th gate of the $i$th layer is an addition gate whose inputs are the $a$'th and $b$'th gates at the preceding layer.
$mult_{i}(j,a,b)$ evaluates to 1 if the $j$th gate of the $i$th layer is a multiplication gate whose inputs are the $a$'th and $b$'th gates at the preceding layer.
The sumcheck protocol is applied to the following: $V_{i}(z)=(p,ω_{1},ω_{2})∈{0,1}_{s+2s}∑ f_{i,z}(p,ω_{1},ω_{2}),$
where
$f_{i}(z,p,ω_{1},ω_{2})=eq _{s_{i}}(z,p)⋅(add_{i}(p,ω_{1},ω_{2})(V_{i+1}(ω_{1})+V_{i+1}(ω_{2}))+mult_{i}(p,ω_{1},ω_{2})V_{i+1}(ω_{1})⋅V_{i+1}(ω_{2}))$ $eq _{s_{i}}(z,p)=j=1∏s_{i} ((1−z_{j})(1−p_{j})+z_{j}p_{j}).$
(See the eqpolynomial page for details.)
Lasso and Jolt currently use the variant of GKR described in Thaler13, which is optimized for the far simpler case of a binary tree of multiplication gates. This simplifies each sumcheck to: $V_{i}(z)=p∈{0,1}_{s}∑ g_{z}(p),$
where
$g_{z}(p)=eq _{s_{i}}(z,p)⋅V_{i+1}(p,0)⋅V_{i+1}(p,1)$ GKR is utilized in memorychecking for the multiset permutation check.
Binius
Binius was written by Ben Diamond and Jim Posen of Irreducible (fka Ulvetanna). It's a new commitment scheme that allows Jolt to use smaller fields more efficiently.
The Binius paper also gives a number of sumcheckbased polynomial IOPs for to be combined with the commitment scheme. For Jolt's purposes, these will yield extremely efficient protocols for proving hash evaluations with various standard hash functions like Keccak (important for recursing JoltwithBiniuscommitment), and for the RISCV addition and multiplication operations.
Here is a brief summary of changes that will occur in Jolt in order to incorporate the Binius commitment scheme:
*Addition and multiplication instructions will be handled via gadgets/constraintsystems instead of lookups. We may want to handle XOR and LT via gadgets too for performance reasons.
*Booleanity constraints in the R1CS can be omitted as that is guaranteed by the binding properties of the Binius commitment scheme.
*In offline memory checking (both Lasso for lookups a.k.a. readonly memory, and Spice for readwrite memory), access counts need to be incremented. The Binius paper (Section 4.4) proposes to do this by sticking counts in the exponent of a generator of an appropriate multiplicative subgroup of $GF(2_{128})$, but it seems more efficient to plan to do the count increments via the addition gadget.
For Spice, the count written during a write operation is set to the maximum of a global counter and the count returned by the preceding read operation, plus 1. If the prover is honest, this written count will always equal the global counter plus 1. This max can be implemented by having the prover commit to the binary representation of $v=global_{count}−returned_{count}$ and using the addition gadget to confirm that $returned_{count}+v=global_{count}$.
*Constraints that involve addition by IMM, the immediate value, will need to implement addition via gadget.
*In the "collation" sumcheck of Lasso that collates results of subtable lookups into the result of the original bigtable lookup, the algebraic implementation of "concatenate the results into one field element" changes. Over a large primeorder field, concatenating two $8$bit values $a$ and $b$ corresponds to $2_{8}⋅a+b$. Over $GF(2_{128})$ constructed via towering, $2_{8}$ gets replaced with an appropriate element of the tower basis. The same change affects various R1CS constraints ensuring that certain committed values are a limb decomposition of another committed value.
*Many performance optimizations become available within the various invocations of the sumcheck protocol, leveraging that $GF(2_{128})$ has small characteristic and most values being summed are in $GF(2_{k})$ for small $k$.
Multiplicative Generator
Finding a Multiplicative Generator for $GF(2_{k})$
To find a generator of the multiplicative group $GF(2_{k})_{∗}$, which includes all nonzero elements of the field $GF(2_{k})$, a probabilistic method is used. This method iteratively tests random elements to determine if they are generators of the group. The process is summarized below.
Probabilistic Method Algo
while true:
# sample random nonzero element x
for each maximal proper divisor d of 𝔽*:
if xᵈ == 1: continue
return x
This algorithm ensures that the element $x$ does not reside in any proper subgroup of $GF(2_{k})_{∗}$. The condition checked within the loop, $x_{d}=1$, ensures that $x$ has the maximal possible order, which is necessary for it to be a generator of the group.
Background:
 Group Structure: $GF(2_{k})_{∗}$ is the multiplicative group of the field $GF(2_{k})$, containing all field elements except 0. Its order is $2_{k}−1$, which is prime to 2.
 Maximal Proper Divisors: These divisors are crucial as testing powers for these values ensures that $x$ is not a member of any smaller cyclic subgroup. The divisors of $2_{k}−1$ are used to verify the generator property.
 Probabilistic Success Rate: The likelihood of a random element being a generator is significant, given the structure of the group. This can be supported by the [[chineseremaindertheorem]], which suggests that the intersection of several smaller groups (subgroups corresponding to divisors) is likely nontrivial only when all subgroup conditions are simultaneously satisfied.
Example Divisors:
 For $GF(2_{16})_{∗}$, $∣F_{∗}∣=2_{16}−1=3×5×17×257$
 For $GF(2_{32})_{∗}$, $∣F_{∗}∣=2_{32}−1=3×5×17×257×65537$
 For $GF(2_{64})_{∗}$, $∣F_{∗}∣=2_{64}−1=3×5×17×257×65537×6700417$
Maximal Proper Divisors
A maximal proper divisor of a number $n$ is a divisor $d$ of $n$ which is neither $1$ nor $n$ itself, and there are no other divisors of $n$ that divide $d$ except $1$ and $d$. Essentially, $d$ is a divisor that is not a multiple of any smaller divisor other than $1$ and itself, making it 'maximal' under the set of proper divisors.
Algorithm for Finding
The algorithm to find the maximal proper divisors of a given number $n$ involves identifying all divisors of $n$ and then selecting those which do not have other divisors besides $1$ and themselves. The steps are as follows:
 Find All Divisors: First, list all divisors of $n$ by checking for every integer $i$ from $1$ to $n $ if $i$ divides $n$. If $i$ divides $n$, then both $i$ and $n/i$ are divisors.
 Filter Maximal Divisors: From this list of divisors, exclude $1$ and $n$. For each remaining divisor, check if it can be expressed as a multiple of any other divisor from the list (other than $1$ and itself). If it cannot, then it is a maximal proper divisor.
function findMaximalProperDivisors(n):
divisors = []
for i from 1 to sqrt(n):
if n % i == 0:
divisors.append(i)
if i != n / i:
divisors.append(n / i)
divisors.remove(1)
divisors.remove(n)
maximal_proper_divisors = []
for d in divisors:
is_maximal = true
for e in divisors:
if d != e and e != 1 and d % e == 0:
is_maximal = false
break
if is_maximal:
maximal_proper_divisors.append(d)
return maximal_proper_divisors
Dev
For more information on setting up your development environment, see Installation Guide.
For details on the tools used in the Jolt development process, refer to Development Tools.
Development Installation
Rust
curl proto '=https' tlsv1.2 sSf https://sh.rustup.rs  sh
 Rustup should automatically install Rust toolchain and necessary targets on
the first
cargo
invocation. If you need to add the RISCV target for building guest programs manually userustup target add riscv32imunknownnoneelf
.
mdBook
For building this book:
cargo install mdbook mdbookkatex
For watching the changes in your local browser:
mdbook watch book open
Development Tools
Tracing
Jolt is instrumented using tokiors/tracing. These traces can be displayed using the format chrome
flag, for example:
cargo run p joltcore release  trace name sha2chain format chrome
After tracing, files can be found in the workspace root with a name trace<timestamp>.json
. Load these traces into perfetto.
Often it's easiest to debug performance for a particular segment by adding granular tracing, adjusting code, rerunning the sha2chain
endtoend benchmark and looking through the Chrome traces.
Objdump
Debugging the emulator / tracer can be hard. Use riscv64unknownelfobjdump
to compare the actual ELF to the .bytecode
/ .jolttrace
files.
All of the open tasks are tracked via Github Issues.
Improvements complete or in progress since the initial release of Jolt in April 2024
There are many tasks described in the various files in the "roadmap/future" section of the wiki. Let's use this file as the primary means of tracking what is done or in progress. Anything not mentioned in this file is presumably not yet started (or barely started).
Functionality improvements

Support for stdlib

Support for Mextension.

In progress: onchain verifier (Solidity).
Verifier cost improvements

Add support for HyperKZG commitment

Add support for Quarks/Spartan grand product argument and hybrid grand product (which achieves most of the verifier benefits of Quarks without a significant hit to prover time).
 Still to do here: Optimize to how the prover commits to partial products.

In progress: change how we are batching grand products, treating them all laid sidebyside as one giant circuit. This will reduce proof size by up to 200KB.

In progress: reduce the number of polynomial evaluation proofs from 710 down to 1.
Prover cost improvements (all in progress)

Eliminate cost of precomputed tables of eq evaluations for each sumcheck, as per DaoThaler.

(Nearly) eliminate second sumcheck instance in Spartan.

Implement the sumcheck prover optimization from Section 3 of Angus Gruen's paper.

AVX512 speedups.

GPU integration.
Below are known optimizations that we will be implementing in the coming weeks, and the anticipated reduction in prover costs (all percentages are relative to the Jolt prover's speed on initial public release in April 2024).

The way we implemented Lasso leads to grand product arguments for which about 90% of the factors being multiplied together are equal to 1. The current implementation explicitly stores all these values (assigning 256 bits to each value). Instead, we can store a densified representation of them (i.e., list only the values that are not 1, and assume all the others are 1). This will speed up the prover, but more importantly it will reduce total prover space usage by a factor of about 8x.
Anticipated speedup: 8% of prover time (and a significant space reduction).

When Spartan is applied to the R1CS arising in Jolt, the prover’s work in the first round involves computation over 64bit integers, not arbitrary field elements. The implementation does not yet take advantage of this: each multiplication in the first round is currently implemented via a 256bit Montgomery multiplication rather than a primitive 64bit multiplication. This is about half of the prover’s total work in this sumcheck invocation. (The same optimization applies for computing the three matrixvector multiplications needed to compute Az, Bz, and Cz before beginning this first sumcheck).
Anticipated speedup: 4% of overall prover time.

The optimization described in Section 3.2 of this paper by Angus Gruen applies to a subset of the invocations of the sumcheck protocol in Jolt, including the first of two sumchecks in Spartan and all sumchecks in Grand Product arguments within memorychecking procedures (Lasso, Spice).
Anticipated speedup: 3% of total prover time.

Switching the commitment scheme from Hyrax to one with much smaller commitments (e.g., HyperKZG, Zeromorph) will not only shorten the proofs, but also save the prover the time of serializing and hashing the commitments for FiatShamir. See this github issue.
Anticipated speedup: 3% of total prover time.

Make it fast to commit to slightly negative values (one group op per value) just as it's fast for small positive values.
Anticipated speedup: 2% of total prover time.

In the first sumcheck in Spartan, the prover precomputes a table of evaluations of (the multilinear extension of) the equality function eq(a, b) with the first vector a fixed to a random value. Leaving a few variables off of b and handling them differently will reduce the cost of building this table to negligible.
Anticipated speedup: 1%2% of total prover time.

On reads to registers or RAM, the value written back to the memory cell by the memorychecking procedure is committed separately from the value returned by the read, and an R1CS constraint is included to force equality. Really, a single value can be committed and the constraint omitted.
Anticipated speedup: 1% of total prover time.

SP1 implements wordaddressable memory (the CPU has to read an entire 64bit word of memory at once). Jolt currently implements byteaddressable memory (the RISCV CPU is allowed to read one byte at a time, as required by the RISCV specification).
Most benchmarks get compiled into RISCV programs that mostly read entire words at once. Switching to wordaddressable memory will improve Jolt’s speed on these benchmarks by 5%.
Total anticipated prover time reduction: 20%30%.
One way to achieve zeroknowledge is to simply compose Jolt with a zeroknowledge SNARK like Groth16. That is, use Groth16 (which is ZK) to prove that one knows a Jolt proof. Composing with Groth16 or Plonk is how most zkVMs get low onchain verification costs anyway, and it also "adds" ZK. This approach is on Jolt's roadmap, although it will take some time to complete (as it requires representing the Jolt verifier in R1CS or Plonkish, which is a pain).
There are also ways to make Jolt zeroknowledge without invoking SNARK composition. For example, rendering sumcheckbased SNARKs zeroknowledge without using composition was exactly the motivation for Zeromorph, which introduces a very efficienct zeroknowledge variant of KZG commitments for multilinear polynomials.
If we use the Zeromorph polynomial commitment scheme, the commitment and any evaluation proof are hiding (they reveal nothing about the committed polynomial, and still give the verifier a commitment to the requested evaluation of the committed polynomial). One still needs to ensure that the various applications of the sumcheck protocol in Jolt also do not leak any information about the witness. Here, techniques based on masking polynomials apply (see Section 13.3 of Proofs, Arguments, and ZeroKnowledge for a sketchy overview). However, the use of masking polynomials requires the prover to be able to commit to nonmultilinear polynomials and hence introduce significant (but surmountable) issues.
A similar approach to achieving ZK also applies when using a hashingbased polynomial commitment scheme like Brakedown. Roughly, to randomize all values sent by the prover during the Brakedown evaluation proof, it suffices to pad the committed polynomial with sufficiently many random values. One does need to lightly modify the Jolt polynomial IOP to "ignore" these extra, random committed values.
A final technique to render all of the sumchecks ZK without SNARK composition is given in Hyrax (based on old work of Cramar and Damgard). Roughly, rather than the prover sending field elements "in the clear", it instead sends (blinded, hence hiding) Pedersen commitments to these field elements. And the verifier exploits homomorphism properties to confirm that the committed field elements would have passed all of the sumcheck verifier's checks. See Section 13.2 of Proofs, Arguments, and ZeroKnowledge for additional discussion.
Estimated costs of onchain verifier with no proof composition
We expect proof storage and verification to cost about 2 million gas, with proofs that are a couple of dozen KBs in size. A crude and speculative explanation of where this estimate comes from is below.
Details
Costs due to processing commitments
Once we fully implement batching of polynomial evaluation proofs, Jolt will only require one HyperKZG opening. That's something like $7+g(N)≈40$ (conservatively) scalar multiplications, and two pairings, where $N$ is the number of RISCV cycles being proven.
On top of that, the verifier is going to do about 80 scalar multiplications for homomorphically combining various commitments.
Let's conservatively ballpark this at 150 scalar multiplications and two pairings in total for the verifier. Each scalar multiplication costs 6,000 gas and each pairing costs 45,000 gas. That's about 1 million gas total due to scalar multiplications and pairings.
Costs due to sumcheck verification and multilinear extension evaluations
The Jolt verifier will invoke the sumcheck protocol about ten times (as little as 5 if we fully batch these sumchecks, meaning run as many as possible in parallel combined with standard randomlinearcombination techniques, but let's say ten as a conservative estimate since it will take us an annoying amount of engineering effort to maximally batch the sumchecks).
One of these sumchecks is from applying Spartan for R1CS, 5 or so are from using the hybrid grand product argument (hybrid between Thaler13/GKR and Quarks/Spartan, which gives a nice balance between prover time and verifier costs), one from batching polynomial evaluation queries, and one from doing an input/outputconsistency check, and one from the "collation" sumcheck in the Lasso lookup argument.
The total amount of data sent in these sumchecks (plus HyperKZG commitments and evaluation proof) should be a few dozen KBs. Indeed, each sumcheck should be roughly 40 rounds, with at most three field elements sent per round (except for the collation sumcheck, as the polynomial involved there has degree 6 or so in each variable). That's $40⋅3⋅32=3840$ bytes of data in each invocation of sumcheck, and another 7 KBs or so for HyperKZG commitments and the evaluation proof. In total, that's perhaps 40 KBs of data.
Sumcheck verification is simply field work, let's say ~100 field ops total per sumcheck (very crude estimate), plus associated FiatShamir hashing. On top of all of that, the Jolt verifier does MLE evaluations for all the "subtables" used in Lasso.
All told we can ballpark the total number of field ops done by the verifier as about $2,000$, and FiatShamirhashing at most 40 KBs of data.
The gas costs of this hashing, plus field work, plus storing the entire proof in CALLDATA, should be under $1$ million.
Combined with the ~1 million gas to perform scalar multiplications and pairings, we obtain an estimate of between 1.3 million and 2 million gas in total.
Groth16 Recursion
Jolt's verifier with no proof composition is expensive. We estimate about 2 million gas to verify within the EVM. Further, the shortterm continuations plan causes a linear blowup depending on the count of monolithic trace chunks proven.
To solve these two issues we're aiming to add a configuration option to the Jolt prover with a post processing step, which creates a Groth16 proof of the Jolt verifier for constant proof size / cost (~280k gas on EVM) regardless of continuation chunk count or opening proof cost. This technique is industry standard. (An added benefit of this proof composition is that it will render Jolt proofs zeroknowledge).
We call directly representing the Jolt verifier (with HyperKZG polynomial commitment scheme) as constraints to then feeding those constraints into Groth16 "naive composition". Unfortunately, this naive procedure will result in over a hundred millions of constraints. Applying Groth16 to such a large constraint system will result in far more latency than we'd lik, (and may even be impossible over the BN254 scalar field because that field only supports FFTs of length $2_{27}$. Below, we describe alternate ways forward.
Cost accounting for naive composition
In naive composition, the bulk of the constraints are devoted to implementing the ~150 scalar multiplications done by the Jolt verifier (or more accurately, ~150 scalar multiplications will be the dominant verifier cost once we finish all optimizations that are now in progress).
Each scalar multiplication costs about 400 group operations, each of which costs about 10 field multiplications, so that's about $150⋅400⋅10=600k$ field multiplications. The real killer is that these field multiplications must be done nonnatively in constraints, due to the fact that that BN254 does not have a pairingfriendly "sister curve" (i.e., a curve whose scalar field matches the BN254 base field). This means that each of the $600k$ field multiplications costs thousands of constraints.
On top of the above, the two pairings done by the HyperKZG verifier, implemented nonnatively in constraints, should cost about 6 million constraints.
We do expect advances in representing nonnative field arithmetic in constraints to bring the above numbers down some, but not nearly enough to make naive composition sensible.
A first step to reducing gas costs
A simple first approach to reducing gas costs is to leave all scalar multiplications and HyperKZGevaluationproofchecking done by the Jolt verifier out of the composition, doing those directly onchain.
This would reduce gas costs to slightly over 1 million, and ensure that Groth16 is run only on a few million constraints (perhaps even less, given upcoming advances in representing nonnative field arithmetic in constraints).
A potential way forward
Here is a first cut at a plan to bring onchain Jolt verification down to a couple hundred thousand gas.
First, represent the Jolt verifier (with HyperKZGoverBN254 as the polynomial commitment scheme) as an R1CS instance over the Grumpkin scalar field, and apply Spartan (with HyraxoverGrumpkin as the polynomial commitment) to this R1CS. This R1CS should only have a few million constraints. This is because all scalar multiplications and pairings done by the Jolt verifier are native over Grumpkin's scalar field. (The constraint system is also highly uniform, which can benefit both the Spartan prover and verifier)
The field operations done by the Jolt verifier in the various invocations of the sumcheck protocol need to be represented nonnatively in these R1CS constraints, but since there are only about 2,000 such field operations they still only cost perhaps 5 million constraints in total. See the Testudo paper for a similar approach and associated calculations.
We expect upcoming advances in methods for addressing nonnative field arithmetic (and/or more careful optimizations of the Jolt verifier) to bring this down to under 2 million constraints.
But the Spartan proof is still too big to post onchain. So, second, represent the Spartan verifier as an R1CS instance over the BN254 scalar field, and apply Groth16 to this R1CS. This the proof posted and verified onchain.
The Spartan verifier only does at most a couple of hundred field operations (since there's only two sumcheck invocations in Spartan) and $2⋅n $ scalar multiplications where $n$ is the number of columns (i.e., witness variables) in the R1CS instance. $2n $ here will be on the order of 3,000. Each scalar multiplication (which are done natively in this R1CS) yields about $4,000$ constraints. So that's 12 million constraints being fed to Groth16.
The calculation above is quite delicate, because running Groth16 on 12 million constraints would be okay for many settings, but 50 million constraints would not be. We do think 12 million is about the right estimate for this approach, especially given that 2,000 field operations for the Jolt verifier is a conservative estimate.
Details of the JoltwithHyperKZG verifier
The easiest way to understand the workload of the verifier circuit is to jump through the codebase starting at vm/mod.rs Jolt::verify(...)
. Verification can be split into 4 logical modules: instruction lookups, readwrite memory, bytecode, r1cs.
Each of the modules do some combination of the following:
 Sumcheck verification
 Polynomial opening proof verification
 Multilinear extension evaluations
After recursively verifying sumcheck, the verifier needs to compare the claimed evaluation of the sumcheck operand at a random point $r$: $S(r)$ to their own evaluation of the polynomial at $r$. Jolt does this with a combination of opening proofs over the constituent polynomials of $S$ and direct evaluations of the multilinear extensions of those polynomials if they have sufficient structure.
Specifics
Polynomial opening proof verification
HyperKZG is currently the optimal commitment scheme for recursion due to the requirement of only 2pairing operations per opening proof. Unfortunately nonnative field arithmetic will always be expensive within a circuit.
There are two options:
 Sumcheck and MLE evaluations using native arithmetic, pairing operations using nonnative arithmetic
 Sumcheck and MLE evaluations using nonnative arithmetic, pairing operations using native arithmetic
We believe the latter is more efficient albeit unergonomic. Some of the details are worked out in this paper here.
Polynomial opening proof batching
Jolt requires tens of opening proofs across all constituent polynomials in all sumchecks. If we did these independently the verifier would be prohibitively expensive. Instead we batch all opening proofs for polynomials which share an evaluation point $r$.
verify_instruction_lookups
Instruction lookups does two sumchecks described in more detail here. The first contains some complexity. The evaluation of the MLE of each of the instructions at the point $r$ spit out by sumcheck is computed directly by the verifier. The verifier is able to do this thanks to the property from Lasso that each table is SOS (decomposable).
The LassoSubtable
trait is implemented for all subtables. LassoSubtable::evaluate_mle(r)
computes the MLE of each subtable. The JoltInstruction
trait combines a series of underlying subtables. The MLEs of these subtables are combined to an instruction MLE via JoltInstruction::combine_lookups(vals: &[F])
. Finally each of the instruction MLEs are combined into a VMwide lookup MLE via InstructionLookupsProof::combine_lookups(...)
.
The Groth16 verifier circuit would have to mimic this pattern. Implementing the MLE evaluation logic for each of the subtables, combination logic for each of the instructions, and combination logic to aggregate all instructions. It's possible that subtables / instructions will be added / removed in the future.
verify_r1cs
R1CS is a modified Spartan instance which runs two sumchecks and a single opening proof.
There are two difficult MLEs to evaluate:
 $A,B,C$ – evaluations of the R1CS coefficient
 $z$ – evaluation of the witness vector
The sections below are underdescribed in the wiki. We'll flush these out shortly. Assume this step comes last.
For $A,B,C$ we must leverage the uniformity to efficiently evaluate. This is underdescribed in the wiki, but we'll get to it ASAP.
The witness vector $z$ is comprised of all of the inputs to the R1CS circuit concatenated together in trace_length
sized chunks. All of these are committed independently and are checked via a batched opening proof.
Engineering Suggestions
The Jolt codebase is rapidly undergoing improvements to reduce prover and verifier costs as well as simplify abstractions. As a result, it's recommended that each section above be built in modules that are convenient to rewire. Each part should be incrementally testable and adjustable.
A concrete example of changes to expect: we currently require 510 opening proofs per Jolt execution. Even for HyperKZG this requires 1020 pairings which is prohibitively expensive. We are working on fixing this via better batching.
Suggested plan of attack:
 Circuit for HyperKZG verification
 Circuit for batched HyperKZG verification
 Circuit for sumcheck verification
 Circuit for instruction/subtable MLE evaluations
 Circuit for Spartan verification
Precompiles
Precompiles are highly optimized SNARK gadgets which can be invoked from the highlevel programming language of the VM user. These gadgets can be much more efficient for the prover than compiling down to the underlying ISA by exploiting the structure of the workload. In practice zkVMs use these for heavy cryptographic operations such as hash functions, signatures and other elliptic curve arithmetic.
By popular demand, Jolt will support these gadgets as well. The short term plan is to optimize for minimizing Joltcore development resources rather than optimal prover speed.
Precompile support plan:
 RV32 library wrapping syscalls of supported libraries
 Tracer picks up syscalls, sets relevant flag bits and loads memory accordingly
 Individual (uniform) Spartan instance for each precompile, repeated over
trace_length
steps  Jolt config includes which precompiles are supported (there is some nonzero prover / verifier cost to including an unused precompile)
 Survey existing hash / elliptic curve arithmetic R1CS arithmetizations. Prioritize efficiency and audits.
 Use circomscotia to convert $A,B,C$ matrices into static files in the Jolt codebase
 Write a converter to uniformly repeat the constraints
trace_length
steps
See the documentation on Jolt's use/handling of sparse constraint systems for a detailed overview of how the Jolt proof machinery will incorporate these precompiled constraint systems.
TODO(sragss): How do we deal with memory and loading more than 64bits of inputs to precompiles.
Prover space control
There is a major impediment to Jolt’s usefulness today: it consumes large amounts of memory, which limits the length of RISCV executions that it can prove. Proving about 16 million RISCV cycles consumes hundreds of GBs of space.
There are two known techniques for reducing the prover space in zkVMs. One is called “continuations” and inherently involves proof recursion. This is the approach taken by almost all zkVM projects today. The other technique is nonrecursive: it is wellknown that sumcheckbased SNARKs in particular do not require recursion to keep the prover's space under control. Nonetheless, very few projects are pursuing this nonrecursive approach (the only one we are aware of is Ligetron, which is not sumcheckbased). We call these nonrecursive techniques "the streaming prover".
Jolt will pursue both approaches to prover space control. Below, we provide more details on the two approaches.
More detail on continuations
Continuations work by breaking a large computation into “chunks”, proving each chunk (almost) independently, and recursively aggregating the proofs (i.e., proving one knows the proofs for each chunk).
Continuations come in two flavors: "bruteforce recursion" and folding. In bruteforce recursion, the proofs for different chunks are aggregated by having the prover prove it knows each proof. Roughly speaking, the verifier of each proof is repesented as a circuit, and a SNARK is used to prove that the prover knows a satisfying assignment for each circuit.
In folding schemes, the "proof" for each chunk is actually only a "partial proof", in particular omitting any evaluation proofs for any committed polynomials. The partial proofs for each chunk are not explicitly checked by anyone. Rather, they are "aggregated" into a single partial proof, and that partial proof is then "completed" into a full SNARK proof. In folding schemes, the prover winds up recursively proving that it correctly aggregated the partial proofs. This has major performance benefits relative to "brute force recursion", because aggregating proofs is much simpler than checking them. Hence, proving aggregation was done correctly is much cheaper than proving fullfledged proof verification was done correctly.
Folding schemes only work with ellipticcurvebased commitment schemes (there have been some efforts to extend folding techniques to hashingbased commitment schemes but they introduce large overheads). So most zkVMs do not pursue folding today as they use hashingbased commitment schemes. Fortunately, Jolt today does use elliptic curves and hence folding as a means of prover space control is available to it.
More detail on nonrecursive space control
Bruteforce recursion has the following downsides.
First, recursive proof aggregation imposes a lower bound of several GBs on prover space, because this is how much space is required to represent a SNARK verifier as a circuit and store that circuit. Second, the chunks are not completely independent, because one has to make sure that the state of the VM’s memory at the end of chunk $i$ equals the state of the VM’s memory at the start of chunk $i+1$. Ensuring this is an engineering headache and adds some performance overhead. Third, recursion actually changes the statement being verified. No longer is the statement “I correctly ran the computer program on a witness and it produced a claimed output $y$”. Rather it is: “I know several proofs, one per chunk, each attesting that the relevant chunk was run correctly, and with the final chunk outputting $y$.” Consequently, once proof recursion is used, it can actually be quite challenging for the public to tell that the right statement is being verified.
Accordingly, we expect the nonrecursive approach to space control to lead to a prover with smaller space than the continuations approach, and to a much simpler system overall.
The nonrecursive approach will also place the security of the SNARK on firmer footing: security of recursive SNARKs is heuristic, while nonrecursive SNARKs like Jolt itself have unconditional security in idealized models such as the Random Oracle model (perhaps with additional assumptions such as hardness of discrete logarithms, depending on the choice of commitment scheme to use in Jolt).
Crucial to nonrecursive space control is that Lasso, the lookup argument used in Jolt, does not require any sorting.
Planned partial steps toward continuations
As discussed above, today Jolt is a monolithic SNARK. RISCV traces cannot be broken up, they must be proven monolithically or not at all. As a result, Jolt has a fixed maximum trace length that can be proved which is a function of the available RAM on the prover machine. Long term we'd like to solve this by implementing a streaming version of Jolt's prover such that prover RAM usage is tunable with minimal performance loss.
Short term we're going to solve this via a very primitive form of continuations we call monolithic chunking. The plan: Take a trace of length $N$ split it into $M$ chunks of size $N/M$ and prove each independently. $N/M$ is a function of the max RAM available to the prover.
For the direct onchain verifier there will be a cost linear in $M$ to verify. We believe this shortterm tradeoff is worthwhile for usability until we can implement the (more complicated) streaming algorithms.
Specifics
A generic config parameter will be added to the Jolt
struct called ContinuationConfig
. At the highest level, before calling Jolt::prove
the trace will be split into M
chunks. Jolt::prove
will be called on each and return RAM_final
which can be fed into RAM_init
during the next iteration of Jolt::prove
. The output zerocheck will only be run for the final chunk.
References on nonrecursive prover space control
*Verifiable computation using multiple provers. Andrew J. Blumberg, Justin Thaler, Victor Vu, and Michael Walfish. https://eprint.iacr.org/2014/846.
*Publiccoin zeroknowledge arguments with (almost) minimal time and space overheads. AR Block, J Holmgren, A Rosen, RD Rothblum, P Soni. Theory of Cryptography Conference, 168197.
*Gemini: Elastic SNARKs for diverse environments J Bootle, A Chiesa, Y Hu, M Orru. EUROCRYPT 2022
*On blackbox constructions of time and space efficient sublinear arguments from symmetrickey primitives. Laasya Bangalore, Rishabh Bhadauria, Carmit Hazay & Muthuramakrishnan Venkitasubramaniam. TCC 2022.
*Ligetron: Lightweight Scalable EndtoEnd ZeroKnowledge Proofs. PostQuantum ZKSNARKs on a Browser. Ruihan Wang Carmit Hazay Muthuramakrishnan Venkitasubramaniam. 2024 IEEE Symposium on Security and Privacy (SP).