Jolt is a zkVM framework built around the Lasso lookup argument.
Jolt powers succinct proofs of execution of programs written in any highlevel language. Jolt's sumcheckbased SNARK makes extensive use of multivariate polynomials and commitment schemes. Jolt zkVMs have stateoftheart prover performance and have substantial room for growth over the coming decades.
Jolt zkVMs have a simple programming model, requiring only 50100 LOC to implement new VM instructions.
The Jolt codebase currently targets the RISCV instruction set which is supported by most highlevel language compilers, but the code is intended to be extensible and usable by any ISA.
The only property of the ISA that Jolt requires is that each primitive instruction is "decomposable". This means that evaluating the instruction on one or two 32bit or 64bit inputs can be done via a procedure of the following form:
 decompose each input into, say, 8bit chunks
 apply one or more specified functions to these chunks (one from each input)
 reconstruct the output of the original instruction from the outputs of the functions operating on the chunks
Related reading
Background reading
Credits
Lasso was written by Srinath Setty, Justin Thaler and Riad Wahby. Jolt was written by Arasu Arun, Srinath Setty, and Justin Thaler.
Jolt was initially forked from Srinath Setty's work on microsoft/Spartan, specifically the arkworksrs/Spartan fork in order to use the excellent Arkworksrs prime field arithmetic library. Jolt's R1CS is also proven using a version of Spartan (forked from the microsoft/Spartan2 codebase) optimized to the case of uniform R1CS constraints.
Using Jolt
Using Jolt is simple thanks to our tooling and macros. To get started, head over to our quickstart guide. For more in depth information check out the guests and hosts section.
Quickstart
Installing
Start by installing the jolt
command line tool.
cargo +nightly install git https://github.com/a16z/jolt force bins jolt
Creating a Project
To create a project, use the jolt new
command.
jolt new <PROJECT_NAME>
This will create a template Jolt project in the specified location. Make sure to cd into the project before running the next step.
cd <PROJECT_NAME>
Project Tour
The main folder contains the host, which is the code that can generate and verify proofs. Within this main folder, there is another package called guest
which contains the Rust functions that we can prove from the host. For more information about these concepts refer to the guests and hosts section.
We'll start by taking a look at our guest. We can view the guest code in guest/src/lib.rs
.
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { #[jolt::provable] fn fib(n: u32) > u128 { let mut a: u128 = 0; let mut b: u128 = 1; let mut sum: u128; for _ in 1..n { sum = a + b; a = b; b = sum; } b } }
As we can see, this implements a simple Fibonacci function called fib
. All we need to do to make our function provable is add the jolt::provable
macro above it.
Next let's take a look at the host code in src/main.rs
.
pub fn main() { let (prove_fib, verify_fib) = guest::build_fib(); let (output, proof) = prove_fib(50); let is_valid = verify_fib(proof); println!("output: {}", output); println!("valid: {}", is_valid); }
This section simply imports guest::build_fib
which is automatically generated by the jolt::provable
macro, and returns functions for proving and verifying our function. The prove function takes the same inputs as the original fib
function, but modifies the outputs to additionally return a proof. The verify function can then be used to check this proof, and return a boolean indicating its validity.
Running
Let's now run the host with cargo
.
cargo run release
This will compile the guest, perform some required preprocessing, and execute the host code which proves and verifies the 50th Fibonacci number. This preprocessing is run within the build_fib
function and adds significant time to running the host, but only needs to be performed once. This means that we could use the prove method many times without rerunning build_fib
. In the future we will support caching this across runs of the host.
Installing
Start by installing the jolt
command line tool.
cargo +nightly install git https://github.com/a16z/jolt force bins jolt
Guests and Hosts
Guests and hosts are the main concepts that we need to understand in order to use Jolt. Guests contain the functions that Jolt will prove, while hosts are where we write more general code and invoke the prover and verifier for our guest functions.
Guests
Guests contain functions for Jolt to prove. Currently, these functions must be written using no_std
Rust. If you have a function that does not require the standard library, making it provable is as easy as ensuring it is inside the guest
package and adding the jolt::provable
macro above it.
Let's take a look at a simple guest program to better understand it.
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { #[jolt::provable] fn add(x: u32, y: u32) > u32 { x + y } }
As we can see, the guest looks like a normal no_std
Rust library. The only major change is the addition of the jolt::provable
macro, which lets Jolt know of the function's existence. Other than no_std
, the only requirement of these functions is that its inputs are serializable and outputs are deserializable with serde
. Fortunately serde
is prevalent throughout the Rust ecosystem, so most types will support it by default.
There is no requirement that just a single function lives within the guest, and we are free to add as many as we need. Additionally, we can import any no_std
compatible library just as we normally would in Rust.
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { use sha2::{Sha256, Digest}; use sha3::{Keccak256, Digest}; #[jolt::provable] fn sha2(input: &[u8]) > [u8; 32] { let mut hasher = Sha256::new(); hasher.update(input); let result = hasher.finalize(); Into::<[u8; 32]>::into(result) } #[jolt::provable] fn sha3(input: &[u8]) > [u8; 32] { let mut hasher = Keccak256::new(); hasher.update(input); let result = hasher.finalize(); Into::<[u8; 32]>::into(result) } }
Hosts
Hosts are where we can invoke the Jolt prover to prove functions defined within the guest. Hosts do not have the no_std
requirement, and are free to use the Rust standard library.
The host imports the guest package, and will have automatically generated functions to build each of the Jolt functions. For the sha2 and sha3 example guest we looked at in the guest section, these functions would be called build_sha2
and build_sha3
respectively. Each returns two results, a prover function and a verifier function. The prover function takes in the same input types as the original function and modifies the output to additionally include a proof. The verifier can then take this proof and verify it.
pub fn main() { let (prove_sha2, verify_sha2) = guest::build_sha2(); let (prove_sha3, verify_sha3) = guest::build_sha3(); let input = &[5u8; 32]; let (output, proof) = prove_sha2(input); let is_valid = verify_sha2(proof); println!("sha2 output: {}", output); println!("sha2 valid: {}", is_valid); let (output, proof) = prove_sha3(input); let is_valid = verify_sha3(proof); println!("sha3 output: {}", output); println!("sha3 valid: {}", is_valid); }
Allocators
Jolt provides an allocator which supports most containers such as Vec
and Box
. This is useful for Jolt users who would like to write no_std
code rather than using Jolt's standard library support. To use these containers, they must be explicitly imported from alloc
. The alloc
crate is automatically provided by rust and does not need to added to the Cargo.toml
file.
Example
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { extern crate alloc; use alloc::vec::Vec; #[jolt::provable] fn alloc(n: u32) > u32 { let mut v = Vec::<u32>::new(); for i in 0..100 { v.push(i); } v[n as usize] } }
Standard Library
Jolt supports the full Rust standard library. Do enable support, simply add the gueststd
feature to the Jolt import in the guest's Cargo.toml
file and remove the #![cfg_attr(feature = "guest", no_std)]
directive from the guest code.
Example
#![allow(unused)] fn main() { [package] name = "guest" version = "0.1.0" edition = "2021" [[bin]] name = "guest" path = "./src/lib.rs" [features] guest = [] [dependencies] jolt = { package = "joltsdk", git = "https://github.com/a16z/jolt", features = ["gueststd"] } }
#![allow(unused)] #![no_main] fn main() { #[jolt::provable] fn int_to_string(n: i32) > String { n.to_string() } }
Troubleshooting
Insufficient Memory or Stack Size
Jolt provides reasonable defaults for the total allocated memory and stack size. It is however possible that the defaults are not sufficient, leading to unpredictable errors within our tracer. To fix this we can try to increase these sizes. We suggest starting with the stack size first as this is much more likely to run out.
Below is an example of manually specifying both the total memory and stack size.
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { extern crate alloc; use alloc::vec::Vec; #[jolt::provable(stack_size = 10000, memory_size = 10000000)] fn waste_memory(size: u32, n: u32) { let mut v = Vec::new(); for i in 0..size { vec.push(i); } } }
Maximum Input or Output Size Exceeded
Jolt restricts the size of the inputs and outputs to 4096 bytes by default. Using inputs and outputs that exceed this size will lead to errors. These values can be configured via the macro.
#![allow(unused)] #![cfg_attr(feature = "guest", no_std)] #![no_main] fn main() { #[jolt::provable(max_input_size = 10000, max_output_size = 10000)] fn sum(input: &[u8]) > u32 { let mut sum = 0; for value in input { sum += value; } sum } }
Guest Attempts to Compile Standard Library
Sometimes after installing the toolchain the guest still tries to compile with the standard library which will fail with a large number of errors that certain items such as Result
are referenced and not available. This generally happens when one tries to run jolt before installing the toolchain. To address, try rerunning jolt installtoolchain
, restarting your terminal, and delete both your rust target directory and any files under /tmp
that begin with jolt.
Getting Help
If none of the above help, please serialize your program and send it along with a detailed bug report.
Serializing a call to the "fib" function in the Jolt guest:
#![allow(unused)] fn main() { // let (prove_fib, verify_fib) = guest::build_fib(); let program_summary = guest::analyze_fib(10); program_summary.write_to_file("fib_10.txt".into()).expect("should write"); }
Contributors
How it works
Jolt
This section gives an overview of the core components of Jolt.
Jolt's four components
A VM does two things:
 Repeatedly execute the fetchdecodeexecute logic of its instruction set architecture.
 Perform reads and writes to Random Access Memory (RAM).
The Jolt paper depicts these two tasks mapped to three components in the final Jolt proof:
The Jolt codebase is similarly organized, but instead separates readwrite memory (comprising registers and RAM) from program code (aka bytecode, which is readonly), for a total of four components:
Readwrite memory
To handle reads/writes to RAM (and registers) Jolt uses a memory checking argument from Spice, which is closely related to Lasso itself. They are both based on "offline memory checking" techniques, the main difference being that Lasso supports readonly memory while Spice supports readwrite memory, making it slightly more expensive. This is implemented in read_write_memory.rs
.
For more details: Readwrite memory
R1CS constraints
To handle the "fetch" part of the fetchdecodeexecute loop, there is a minimal R1CS instance (about 60 constraints per cycle of the RISCV VM). These constraints handle program counter (PC) updates and serves as the "glue" enforcing consistency between polynomials used in the components below. Jolt uses Spartan, optimized for the highlystructured nature of the constraint system (e.g., the R1CS constraint matrices are blockdiagonal with blocks of size only about 60 x 80). This is implemented in joltcore/src/r1cs.
For more details: R1CS constraints
Instruction lookups
To handle the "execute" part of the fetchdecodeexecute loop, Jolt invokes the Lasso lookup argument. The lookup argument maps every instruction (including its operands) to its output. This is implemented in instruction_lookups.rs.
For more details: Instruction lookups
Bytecode
To handle the "decode" part of the fetchdecodeexecute loop, Jolt uses another instance of (readonly) offline memorychecking, similar to Lasso. The bytecode of the guest program is "decoded" in preprocessing, and the prover subsequently invokes offline memorychecking on the sequence of reads from this decoded bytecode corresponding to the execution trace being proven. This is implemented in bytecode.rs.
For more details: Bytecode
Instruction lookups
In Jolt, the "execute" part of the "fetchdecodeexecute" loop is handled using a lookup argument (Lasso). At a high level, the lookup queries are the instruction operands, and the lookup outputs are the instruction outputs.
Lasso
Lasso is a lookup argument (equivalent to a SNARK for reads into a readonly memory). Lookup arguments allow the prover to convince the verifier that for a committed vector of values $v$, committed vector of indices $a$, and lookup table $T$, $T[a_{i}]=v_{i}$ for all $i$.
Lasso is a special lookup argument with highly desirable asymptotic costs largely correlated to the number of lookups (the length of the vectors $a$ and $v$), rather than the length of of the table $T$.
A conversational background on lookups can be found here. In short: lookups are great for zkVMs as they allow constant cost / developer complexity for the prover algorithm per VM instruction.
A detailed engineering overview of Lasso (as a standalone lookup argument) can be found here.
Using Lasso to handle instruction execution
Lasso requires that each primitive instruction satisfies a decomposability property. The property needed is that the input(s) to the instruction can be broken into "chunks" (say, with each chunk consisting of 16 bits), such that one can obtain the answer to the original instruction by evaluating a simple function on each chunk and then "collating" the results together. For example, the bitwiseOR of two 32bit inputs x and y can be computed by breaking each input up into 8bit chunks, XORing each 8bit chunk of x with the associated chunk of y, and concatenating the results together.
In Lasso, we call the task of evaluating a simple function on each chunk a "subtable lookup" (the relevant lookup table being the table storing all $2_{16}$ evaluations of the simple function). And the "collating" of the results of the subtable lookups into the result of the original lookup (instruction execution on the unchunked inputs) is handled via an invocation of the sumcheck protocol. We call this the "primary sumcheck" instance in Lasso.
The "primary sumcheck" collation looks as follows for a trace of length $m$ and a VM with $F$ unique instructions.
$∑_{x∈{0,1}_{log(m)}}[eq (r,x)⋅∑_{f∈{0,1}_{log(F)}}flags_{f} (x)⋅g_{f}(terms_{f}(x))]$
$flags_{f} (x)$ is an indicator function for the $f$th instruction: $flags_{f} (x)=1$ if instruction $f$ is used during the $x$th step of the trace, when $x∈{0,1}_{log_{2}(m)}$.
$g_{f}(...)$ is the collation function used by the $f$th instruction.
$terms_{f}(x)=[E_{1}(x),...E_{α}(x)]$ where $α$ is the number of independent memories used by an instruction. For simple instructions like the EQ instruction, $α=C$, where $C$ is the number of chunks that the inputs are broken into, so $terms_{f}(x)=[E_{1}(x),...E_{C}(x)]$. More complicated instructions such LT might have $terms_{f}(x)=[E_{EQ}(x),E_{LT1}(x),E_{LT2}(x)]$. The exact layout is dependent on the number of subtables required by the decomposition of the instruction. The mappings can be found in the JoltInstruction::subtable
method implementations.
Mental Model
For a given $r∈{0,1}_{log(m)}$ (think integer index of the instruction within the trace), $eq (r,x)=0$ for all but one term $x$ of the outer sum. Similarly all $flags_{f} (x)=0$ for all but one term of the inner sum. Leaving just the collation function of a single instruction, evaluating to the collated lookup output of the single instruction. In reality $r$ is a random point $r∈F_{log(m)}$ selected by the verifier over the course of the protocol. The evaluation point provides a distance amplified encoding of the entire trace of instructions.
To illustrate more concretely imagine a twoinstruction VM for LT and EQ instructions with $C=1$.
$x∈{0,1}_{log(m)}∑ eq (r,x)⋅[flags _{LT}(x)⋅g_{LT}(E_{LT}(x))+flags _{EQ}(x)⋅g_{EQ}(E_{EQ}(x))]$
Subtable Flags
TODO
 We then use memory checking to determine that each of the memories $E_{i}$ is well formed
 At a given step of the CPU only a single instruction will be used, that means that only that instruction's subtables will be used. For the rest of the memories we insert a no_op with (a, v) = 0. In order to make the GKR trees cheaper to compute and sumcheck we'll add a single additional layer to the GKR tree. During this layer we'll "toggle" each of the GKR leaves to "1" in the case that it is an unused step of the CPU. This will make the binary tree of multiplication gates cheaper. We'll toggle based on a new flags polynomial called $subtable−flags_{f}$ which is the sum of all of the $instruction−flags_{f}$ used in the instruction collation above.
 The function to compute each of the leaves becomes $leaf[i]=subtableflags[i]⋅fingerprint[i]+(1−subtableflags[i])$
Readwrite memory (registers and RAM)
Jolt proves the validity of registers and RAM using offline memory checking. In contrast to our usage of offline memory checking in other modules, registers and RAM are writable memory.
Memory layout
For the purpose of offline memory checking, Jolt treats registers, program inputs/outputs, and RAM as occupying one unified address space. This remapped address space is laid out as follows:
The zeropadding depicted above is sized so that RAM starts at a poweroftwo offset (this is explained below). As noted in the diagram, the size of the witness scales with the highest memory addressed over the course of the program's execution. In addition to the zeropadding between the "Program I/O" and "RAM" sections, the end of the witness is zeropadded to a power of two.
Handling program I/O
Inputs
Program inputs and outputs (and the panic bit, which indicates whether the proram panicked) live in the same memory address space as RAM. Program inputs populate the designated input space upon initialization:
The verifier can efficiently compute the MLE of this initial memory state on its own (i.e. in time proportional to the IO size, not the total memory size).
Ouputs and panic
On the other hand, the verifier cannot compute the MLE of the final memory state on its own –– though the program I/O is known to the verifier, the final memory state contains values written to registers/RAM over the course of program execution, which are not known to the verifier.
The verifier is, however, able to compute the MLE of the program I/O values (padded on both sides with zeros) –– this is denoted v_io
below.
If the prover is honest, then the final memory state (v_final
below) should agree with v_io
at the indices corresponding to program I/O.
To enforce this, we invoke the sumcheck protocol to perform a "zerocheck" on the difference between v_final
and v_io
:
This also motivates the zeropadding between the "Program I/O" and "RAM" sections of the witness. The zeropadding ensures that both input_start
and ram_witness_offset
are powers of two, which makes it easier for the verifier to compute the MLEs of v_init
and v_io
.
Timestamp range check
Registers and RAM are writable memory, which introduces additional requirements compared to offline memory checking in a readonly context.
The multiset equality check for readonly memory, typically expressed as $I⋅W=R⋅F$, is not adequate for ensuring the accuracy of read values. It is essential to also verify that each read operation retrieves a value that was written in a previous step (not a future step). (Here, $I$ denotes the tuples capturing initialization of memory and $W$ the tuples capturing all of the writes to memory following initialization. $R$ denotes the tuples capturing all read operations, and $F$ denotes tuples capturing a final read pass over all memory cells).
To formalize this, we assert that the timestamp of each read operation, denoted as $read_timestamp$, must not exceed the global timestamp at that particular step. The global timestamp starts at 0 and is incremented once per step.
The verification of $read_timestamp≤global_timestamp$ is equivalent to confirming that $read_timestamp$ falls within the range $[0,TRACE_LENGTH)$ and that the difference $(global_timestamp−read_timestamp)$ is also within the same range.
The process of ensuring that both $read_timestamp$ and $(global_timestamp−read_timestamp)$ lie within the specified range is known as rangechecking. This is the procedure implemented in timestamp_range_check.rs
, using a modified version of Lasso.
Intuitively, checking that each read timestamp does not exceed the global timestamp prevents an attacker from answering all read operations to a given cell with "the right set of values, but out of order". Such an attack requires the attacker to "jump forward and backward in time". That is, for this attack to succeed, at some timestamp t when the cell is read, the attacker would have to return a value that will be written to that cell in the future (and at some later timestamp t' when the same cell is read the attacker would have to return a value that was written to that cell much earlier). This attack is prevented by confirming that all values returned have a timestamp that does not exceed the current global timestamp.
Bytecode
Jolt proves the validity of registers and RAM using offline memory checking.
Decoding an ELF file
The tracer iterates over the .text
sections of the program ELF file and decode RISCV instructions. Each instruction gets mapped to a BytecodeRow
struct:
#![allow(unused)] fn main() { pub struct BytecodeRow { /// Memory address as read from the ELF. address: usize, /// Packed instruction/circuit flags, used for r1cs bitflags: u64, /// Index of the destination register for this instruction (0 if register is unused). rd: u64, /// Index of the first source register for this instruction (0 if register is unused). rs1: u64, /// Index of the second source register for this instruction (0 if register is unused). rs2: u64, /// "Immediate" value for this instruction (0 if unused). imm: u64, } }
The registers (rd
, rs1
, rs2
) and imm
are described in the RISCV spec.
The bitflags
field is described in greater detail below.
The preprocessed bytecode serves as the (readonly) "memory" over which we perform offline memory checking.
We currently assume the program bytecode is sufficiently short and known by the verifier, so the prover doesn't need to provide a commitment to it –– the verifier can compute the MLE of the bytecode on its own. This guarantees that the bytecode trace is consistent with the agreedupon program.
Instead of $(a,v,t)$ tuples, each step in the bytecode trace is mapped to a tuple $(a,v_{bitflags},v_{rd},v_{rs1},v_{rs2},v_{imm},t)$. Otherwise, the offline memory checking proceeds as described in Offline Memory Checking.
Bitflags
The bitflags
of a given instruction is the concatenation of its circuit flags and instruction flags.
This concatenation is enforced by R1CS constraints.
R1CS constraints
Jolt usees R1CS constraints to enforce certain rules of the RISCV fetchdecodeexecute loop and to ensure consistency between the proofs for the different modules of Jolt (instruction lookups, readwrite memory, and bytecode).
Uniformity
Jolt's R1CS is uniform, which means that the constraint matrices for an entire program are just repeated copies of the constraint matrices for a single CPU step. Each step is conceptually simple and involves around 60 constraints and 80 variables.
Input Variables and constraints
The inputs required for the constraint system for a single CPU step are:
Pertaining to bytecode
 Program counters (PCs): this is the only state passed between CPU steps.
 Bytecode read address: the address in the program code read at this step.
 The preprocessed ("5tuple") representation of the instruction: (
bitflags
,rs1
,rs2
,rd
,imm
).
Pertaining to readwrite memory
 The (starting) RAM address read by the instruction: if the instruction is not a load/store, this is 0.
 The bytes written to or read from memory.
Pertaining to instruction lookups
 The chunks of the instruction's operands
x
andy
.  The chunks of the lookup query. These are typically some combination of the operand chunks (e.g. the ith chunk of the lookup query is often the concatenation of
x_i
andy_i
).  The lookup output.
Circuit and instruction flags:
 There are nine circuit flags used to guide the constraints and are dependent only on the opcode of the instruction. These are thus stored as part of the preprocessed bytecode in Jolt.
operand_x_flag
: 0 if the first operand is the value inrs1
or thePC
.operand_y_flag
: 0 if the second operand is the value inrs2
or theimm
.is_load_instr
is_store_instr
is_jump_instr
is_branch_instr
if_update_rd_with_lookup_output
: 1 if the lookup output is to be stored inrd
at the end of the step.sign_imm_flag
: used in load/store and branch instructions where the instruction is added as constraints.is_concat
: indicates whether the instruction performs a concattype lookup.
 Instruction flags: these are the unary bits used to indicate instruction is executed at a given step. There are as many per step as the number of unique instruction lookup tables in Jolt, which is 19.
Constraint system
The constraints for a CPU step are detailed in the get_jolt_matrices()
function in constraints.rs
.
Reusing commitments
As with most SNARK backends, Spartan requires computing a commitment to the inputs
to the constraint system.
A catch (and an optimization feature) in Jolt is that most of the inputs
are also used as inputs to proofs in the other modules. For example,
the address and values pertaining to the bytecode are used in the bytecode memorychecking proof,
and the lookup chunks, output and flags are used in the instruction lookup proof.
For Jolt to be sound, it must be ensured that the same inputs are fed to all relevant proofs.
We do this by reusing the commitments themselves.
This can be seen in the format_commitments()
function in the r1cs/snark
module.
Spartan is adapted to take precommitted witness variables.
Exploiting uniformity
The uniformity of the constraint system allows us to heavily optimize both the prover and verifier. The main changes involved in making this happen are:
 Spartan is modified to only take in the constraint matrices a single step, and the total number of steps. Using this, the prover and verifier can efficiently calculate the multilinear extensions of the full R1CS matrices.
 The commitment format of the witness values is changed to reflect uniformity. All versions of a variable corresponding to each time step is committed together. This affects nearly all variables committed to in Jolt.
 The inputs and witnesses are provided to the constraint system as segments.
 Additional constraints are used to enforce consistency of the state transferred between CPU steps.
These changes and their impact on the code are visible in spartan.rs
.
Background
If you're interested in learning the nitty gritty details of SNARKs including questions like 'why', 'is it sound', you should read Proofs and Args ZK by Thaler. If you're simply interested in 'how', this section is for you.
The Sumcheck Protocol
Adapted from the Thaler13 exposition. For a detailed introduction to sumcheck see Chapter 4.1 of the textbook.
Suppose we are given a $v$variate polynomial $g$ defined over a finite field $F$. The purpose of the sumcheck protocol is to compute the sum:
$H:=b_{1}∈{0,1}∑ b_{2}∈{0,1}∑ ⋯b_{v}∈{0,1}∑ g(b_{1},…,b_{v}).$
In order to execute the protocol, the verifier needs to be able to evaluate $g(r_{1},…,r_{v})$ for a randomly chosen vector $(r_{1},…,r_{v})∈F_{v}$. Hence, from the verifier's perspective, the sumcheck protocol reduces the task of summing $g$'s evaluations over $2_{v}$ inputs (namely, all inputs in ${0,1}_{v}$) to the task of evaluating $g$ at a single input in $(r_{1},…,r_{v})∈F_{v}$.
The protocol proceeds in $v$ rounds as follows. In the first round, the prover sends a polynomial $g_{1}(X_{1})$, and claims that
$g_{1}(X_{1})=x_{2},…,x_{v}∈{0,1}_{v−1}∑ g(X_{1},x_{2},…,x_{v}).$
Observe that if $g_{1}$ is as claimed, then $H=g_{1}(0)+g_{1}(1)$. Also observe that the polynomial $g_{1}(X_{1})$ has degree $deg_{1}(g)$, the degree of variable $x_{1}$ in $g$. Hence $g_{1}$ can be specified with $deg_{1}(g)+1$ field elements. In our implementation, $P$ will specify $g$ by sending the evaluation of $g_{1}$ at each point in the set ${0,1,…,deg_{1}(g)}$. (Actually, the prover does not need to send $g_{1}(1)$, since since the verifier can infer that $g_{1}(1)=H−g_{1}(0)$, as if this were not the case, the verifier would reject).
Then, in round $j>1$, $V$ chooses a value $r_{j−1}$ uniformly at random from $F$ and sends $r_{j−1}$ to $P$. We will often refer to this step by saying that variable $j−1$ gets bound to value $r_{j−1}$. In return, the prover sends a polynomial $g_{j}(X_{j})$, and claims that
$g_{j}(X_{j})=(x_{j+1},…,x_{v})∈{0,1}_{v−j}∑ g(r_{1},…,r_{j−1},X_{j},x_{j+1},…,x_{v}).(1)$
The verifier compares the two most recent polynomials by checking that $g_{j−1}(r_{j−1})=g_{j}(0)+g_{j}(1)$, and rejecting otherwise. The verifier also rejects if the degree of $g_{j}$ is too high: each $g_{j}$ should have degree $deg_{j}(g)$, the degree of variable $x_{j}$ in $g$.
In the final round, the prover has sent $g_{v}(X_{v})$ which is claimed to be $g(r_{1},…,r_{v−1},X_{v})$. $V$ now checks that $g_{v}(r_{v})=g(r_{1},…,r_{v})$ (recall that we assumed $V$ can evaluate $g$ at this point). If this test succeeds, and so do all previous tests, then the verifier accepts, and is convinced that $H=g_{1}(0)+g_{1}(1)$.
Multilinear Extensions
For any $v$variate polynomial $g(x_{1},...x_{v})$ polynomial, it's multilinear extension $f(x_{1},...x_{v})$ is the polynomial which agrees over all $2_{v}$ points $x∈{0,1}_{v}$: $g(x_{1},...x_{v})=f~ (x_{1},...x_{v})∀x∈{0,1}_{v}$. By the SchwartzZippel lemma, if $g$ and $f$ disagree at even a single input, then $g$ and $f$ must disagree almost everywhere.
For more precise details please read Section 3.5 of Proofs and Args ZK.
Engineering
In practice, MLE's are stored as the vector of evaluations over the $v$variate boolean hypercube ${0,1}_{v}$. There are two important algorithms over multilinear extensions: single variable binding, and evaluation.
Single Variable Binding
With a single streaming pass over all $n$ evaluations we can "bind" a single variable of the $v$variate multilinear extension to a point $r$. This is a critical subalgorithm in sumcheck. During the binding the number of evaluation points used to represent the MLE gets reduced by a factor of 2:
 Before: $f~ (x_{1},...x_{v}):{0,1}_{v}→F$
 After: $f_{′}~ (x_{1},...x_{v−1})=f~ (r,x_{1},...x_{v−1}):{0,1}_{v−1}→F$
Assuming your MLE is represented as a 1D vector of $2_{v}$ evaluations $E$ over the $v$variate boolean hypercube ${0,1}_{v}$, indexed littleendian
 $E[1]=f~ (0,0,0,1)$
 $E[5]=f~ (0,1,0,1)$
 $E[8]=f~ (1,0,0,0)$
Then we can transform the vector $E→E_{′}$ representing the transformation from $f~ (x_{1},...x_{v})→f_{′}~ (r,x_{1},...x_{v−1})$ by "binding" the evaluations vector to a point $r$.
let n = 2^v;
let half = n / 2;
for i in 0..half {
let low = E[i];
let high = E[half + i];
E[i] = low + r * (high  low);
}
Multi Variable Binding
Another common algorithm is to take the MLE $f~ (x_{1},...x_{v})$ and compute its evaluation at a single $v$variate point outside the boolean hypercube $x∈F_{v}$. This algorithm can be performed in $O(n)$ time by preforming the single variable binding algorithm $g(n)$ times. The time spent on $i$'th variable binding is $O(n/2_{i})$, so the total time across all $gn$ bindings is proportional to $∑_{i=1}n/2_{i}=O(n)$.
Eq Extension
The $eq $ multilinear extension (MLE) is useful throughout the protocol. It is the MLE of the function $eq:{0,1}_{m}×{0,1}_{m}→F$ defined as follows:
$eq(r,x)={10 ifr=xotherwise $
$eq $ has the following explicit formula: $eq (r,x)=i=1∏log(m) (r_{i}⋅x_{i}+(1−r_{i})⋅(1−x_{i}))wherer,x∈{0,1}_{log(m)}$
Batched Openings
At the end of the sumcheck protocol, the verifier must evaluate many multilinear polynomials at a single point. For polynomials which the verifier cannot compute on their own, they depend on the verification of a PCS opening proof provided by the prover. To save on verifier costs, all polynomials opened at the same point can be combined to a single opening proof.
The best reading on the subject can be found in Section 16.1 of the Textbook.
Offline Memory Checking
Offline memory checking is a method that enables a prover to demonstrate to a verifier that a read/write memory was used correctly. In such a memory system, values $v$ can be written to addresses $a$ and subsequently retrieved. This technique allows the verifier to efficiently confirm that the prover adhered to the memory's rules (i.e., that the value returned by any read operation is indeed the most recent value that was written to that memory cell).
Jolt utilizes offline memory checking in the Bytecode prover, Lookup prover (for VM instruction execution), and RAM prover. The RAM prover must support a readwrite memory. The Bytecode prover and Lookup prover need only support readonly memories. In the case of the Bytecode prover, the memory is initialized to contain the Bytecode of the RISCV program, and the memory is never modified (it's readonly). And the lookup tables used for VM instruction execution are determined entirely by the RISCV instruction set.
(The term "offline memory checking" refers to techniques that check the correctness of all read operations "all at once", after the reads have all occurredor in SNARK settings, after the purported values returned by the reads have been committed. Offline checking techniques do not determine as a read happens whether or not it was correct. They only ascertain, when all the reads are checked at once, whether or not all of the reads were correct.
This is in contrast to "online memory checking" techniques like Merkle hashing that immediately confirm that a memory read was done correctly by insisting that each read includes an authentication path. Merkle hashing is much more expensive on a perread basis for SNARK provers, and offline memory checking suffices for SNARK design. This is why Lasso and Jolt use offline memory checking techniques rather than online).
Initialization Algorithm
TODO
:
 Initialize four timestamp counters
 Implicitly assume a read operation before each write
Multiset Check
Define $read$ and $write$ as subsets, and $init$ and $final$ as subsets: $read,write⊆{(a_{i},v_{i},t_{i})∣i∈[0,m]}$ $init,final⊆{(a_{i},v_{i},t_{i})∣i∈[0,M]}$ Here, $a_{i}$, $v_{i}$, and $t_{i}$ represent the address, value, and timestamp respectively, with $m$ being the total number of memory operations and $M$ the size of the RAM.
The verifier checks that the combination of $read$ and $final$ matches $write$ and $init$, disregarding the sequence of elements, known as a permutation check: $read∪final=write∪init$ Jolt conducts this check using a homomorphic hash function applied to each set: $H(read)=(a_{i},v_{i},t_{i})∈read∏ h(a_{i},v_{i},t_{i})$ $H(write)=(a_{i},v_{i},t_{i})∈write∏ h(a_{i},v_{i},t_{i})$ $H(init)=(a_{j},v_{j},t_{j})∈init∏ h(a_{j},v_{j},t_{j})$ $H(final)=(a_{j},v_{j},t_{j})∈final∏ h(a_{j},v_{j},t_{j})$ The hash function $h$ is defined as: $h_{γ,τ}(a,v,t)=a⋅γ_{2}+v⋅γ+t−τ$
This multiset hashing process is represented by a binary tree of multiplication gates and is computed using an optimized GKR protocol.
References
 Original BEGKN paper on offline memory checking, forming the technical underpinnings of Lasso
 Spice Protocol for readwrite memories (see Fig. 2)
 Spartan Protocol
 Lasso Protocol for readonly memories a.k.a. lookups
 Thaler13 Grand Product Protocol (see Prop. 2)
 Quarks Grand Product Protocol (see Section 6)
GKR
GKR is a SNARK protocol for smalldepth circuits of multiplication and addition gates. The standard form allows combinations of both. The protocol involves three MLEs: $V_{i}$ is the MLE of a function capturing the values of the gates at layer $i$, and $add_{i}$ and $mult_{i}$ are MLEs of functions capturing the circuit's wires.
Specifically, $V_{i}(j)$ evaluates to the value of the circuit at the $i$th layer in the $j$th gate. For example $V_{1}(0)$ corresponds to the output gate.
$add_{i}(j,a,b)$ evaluates to 1 if the $j$th gate of the $i$th layer is an addition gate whose inputs are the $a$'th and $b$'th gates at the preceding layer.
$mult_{i}(j,a,b)$ evaluates to 1 if the $j$th gate of the $i$th layer is a multiplication gate whose inputs are the $a$'th and $b$'th gates at the preceding layer.
The sumcheck protocol is applied to the following: $V_{i}(z)=(p,ω_{1},ω_{2})∈{0,1}_{s+2s}∑ f_{i,z}(p,ω_{1},ω_{2}),$
where
$f_{i}(z,p,ω_{1},ω_{2})=eq _{s_{i}}(z,p)⋅(add_{i}(p,ω_{1},ω_{2})(V_{i+1}(ω_{1})+V_{i+1}(ω_{2}))+mult_{i}(p,ω_{1},ω_{2})V_{i+1}(ω_{1})⋅V_{i+1}(ω_{2}))$ $eq _{s_{i}}(z,p)=j=1∏s_{i} ((1−z_{j})(1−p_{j})+z_{j}p_{j}).$
(See the eqpolynomial page for details.)
Lasso and Jolt currently use the variant of GKR described in Thaler13, which is optimized for the far simpler case of a binary tree of multiplication gates. This simplifies each sumcheck to: $V_{i}(z)=p∈{0,1}_{s}∑ g_{z}(p),$
where
$g_{z}(p)=eq _{s_{i}}(z,p)⋅V_{i+1}(p,0)⋅V_{i+1}(p,1)$ GKR is utilized in memorychecking for the multiset permutation check.
Binius
Binius was written by Ben Diamond and Jim Posen of Irreducible (fka Ulvetanna). It's a new commitment scheme that allows Jolt to use smaller fields more efficiently.
The Binius paper also gives a number of sumcheckbased polynomial IOPs for to be combined with the commitment scheme. For Jolt's purposes, these will yield extremely efficient protocols for proving hash evaluations with various standard hash functions like Keccak (important for recursing JoltwithBiniuscommitment), and for the RISCV addition and multiplication operations.
Here is a brief summary of changes that will occur in Jolt in order to incorporate the Binius commitment scheme:
*Addition and multiplication instructions will be handled via gadgets/constraintsystems instead of lookups. We may want to handle XOR and LT via gadgets too for performance reasons.
*Booleanity constraints in the R1CS can be omitted as that is guaranteed by the binding properties of the Binius commitment scheme.
*In offline memory checking (both Lasso for lookups a.k.a. readonly memory, and Spice for readwrite memory), access counts need to be incremented. The Binius paper (Section 4.4) proposes to do this by sticking counts in the exponent of a generator of an appropriate multiplicative subgroup of $GF(2_{128})$, but it seems more efficient to plan to do the count increments via the addition gadget.
For Spice, the count written during a write operation is set to the maximum of a global counter and the count returned by the preceding read operation, plus 1. If the prover is honest, this written count will always equal the global counter plus 1. This max can be implemented by having the prover commit to the binary representation of $v=global_{count}−returned_{count}$ and using the addition gadget to confirm that $returned_{count}+v=global_{count}$.
*Constraints that involve addition by IMM, the immediate value, will need to implement addition via gadget.
*In the "collation" sumcheck of Lasso that collates results of subtable lookups into the result of the original bigtable lookup, the algebraic implementation of "concatenate the results into one field element" changes. Over a large primeorder field, concatenating two $8$bit values $a$ and $b$ corresponds to $2_{8}⋅a+b$. Over $GF(2_{128})$ constructed via towering, $2_{8}$ gets replaced with an appropriate element of the tower basis. The same change affects various R1CS constraints ensuring that certain committed values are a limb decomposition of another committed value.
*Many performance optimizations become available within the various invocations of the sumcheck protocol, leveraging that $GF(2_{128})$ has small characteristic and most values being summed are in $GF(2_{k})$ for small $k$.
Multiplicative Generator
Finding a Multiplicative Generator for $GF(2_{k})$
To find a generator of the multiplicative group $GF(2_{k})_{∗}$, which includes all nonzero elements of the field $GF(2_{k})$, a probabilistic method is used. This method iteratively tests random elements to determine if they are generators of the group. The process is summarized below.
Probabilistic Method Algo
while true:
# sample random nonzero element x
for each maximal proper divisor d of 𝔽*:
if xᵈ == 1: continue
return x
This algorithm ensures that the element $x$ does not reside in any proper subgroup of $GF(2_{k})_{∗}$. The condition checked within the loop, $x_{d}=1$, ensures that $x$ has the maximal possible order, which is necessary for it to be a generator of the group.
Background:
 Group Structure: $GF(2_{k})_{∗}$ is the multiplicative group of the field $GF(2_{k})$, containing all field elements except 0. Its order is $2_{k}−1$, which is prime to 2.
 Maximal Proper Divisors: These divisors are crucial as testing powers for these values ensures that $x$ is not a member of any smaller cyclic subgroup. The divisors of $2_{k}−1$ are used to verify the generator property.
 Probabilistic Success Rate: The likelihood of a random element being a generator is significant, given the structure of the group. This can be supported by the [[chineseremaindertheorem]], which suggests that the intersection of several smaller groups (subgroups corresponding to divisors) is likely nontrivial only when all subgroup conditions are simultaneously satisfied.
Example Divisors:
 For $GF(2_{16})_{∗}$, $∣F_{∗}∣=2_{16}−1=3×5×17×257$
 For $GF(2_{32})_{∗}$, $∣F_{∗}∣=2_{32}−1=3×5×17×257×65537$
 For $GF(2_{64})_{∗}$, $∣F_{∗}∣=2_{64}−1=3×5×17×257×65537×6700417$
Maximal Proper Divisors
A maximal proper divisor of a number $n$ is a divisor $d$ of $n$ which is neither $1$ nor $n$ itself, and there are no other divisors of $n$ that divide $d$ except $1$ and $d$. Essentially, $d$ is a divisor that is not a multiple of any smaller divisor other than $1$ and itself, making it 'maximal' under the set of proper divisors.
Algorithm for Finding
The algorithm to find the maximal proper divisors of a given number $n$ involves identifying all divisors of $n$ and then selecting those which do not have other divisors besides $1$ and themselves. The steps are as follows:
 Find All Divisors: First, list all divisors of $n$ by checking for every integer $i$ from $1$ to $n $ if $i$ divides $n$. If $i$ divides $n$, then both $i$ and $n/i$ are divisors.
 Filter Maximal Divisors: From this list of divisors, exclude $1$ and $n$. For each remaining divisor, check if it can be expressed as a multiple of any other divisor from the list (other than $1$ and itself). If it cannot, then it is a maximal proper divisor.
function findMaximalProperDivisors(n):
divisors = []
for i from 1 to sqrt(n):
if n % i == 0:
divisors.append(i)
if i != n / i:
divisors.append(n / i)
divisors.remove(1)
divisors.remove(n)
maximal_proper_divisors = []
for d in divisors:
is_maximal = true
for e in divisors:
if d != e and e != 1 and d % e == 0:
is_maximal = false
break
if is_maximal:
maximal_proper_divisors.append(d)
return maximal_proper_divisors
Dev
For more information on setting up your development environment, see Installation Guide.
For details on the tools used in the Jolt development process, refer to Development Tools.
Development Installation
Rust
curl proto '=https' tlsv1.2 sSf https://sh.rustup.rs  sh
 Rustup should automatically install Rust toolchain and necessary targets on
the first
cargo
invocation. If you need to add the RISCV target for building guest programs manually userustup target add riscv32iunknownnoneelf
.
mdBook
For building this book:
cargo install mdbook mdbookkatex
For watching the changes in your local browser:
mdbook watch book open
Development Tools
Tracing
Jolt is instrumented using tokiors/tracing. These traces can be displayed using the format chrome
flag, for example:
cargo run p joltcore release  trace name sha2chain format chrome
After tracing, files can be found in the workspace root with a name trace<timestamp>.json
. Load these traces into perfetto.
Often it's easiest to debug performance for a particular segment by adding granular tracing, adjusting code, rerunning the sha2chain
endtoend benchmark and looking through the Chrome traces.
Objdump
Debugging the emulator / tracer can be hard. Use riscv64unknownelfobjdump
to compare the actual ELF to the .bytecode
/ .jolttrace
files.
All of the open tasks are tracked via Github Issues.
Below are known optimizations that we will be implementing in the coming weeks, and the anticipated reduction in prover costs (all percentages are relative to the Jolt prover's speed on initial public release in April 2024).

The way we implemented Lasso leads to grand product arguments for which about 90% of the factors being multiplied together are equal to 1. The current implementation explicitly stores all these values (assigning 256 bits to each value). Instead, we can store a densified representation of them (i.e., list only the values that are not 1, and assume all the others are 1). This will speed up the prover, but more importantly it will reduce total prover space usage by a factor of about 8x.
Anticipated speedup: 8% of prover time (and a significant space reduction).

When Spartan is applied to the R1CS arising in Jolt, the prover’s work in the first round involves computation over 64bit integers, not arbitrary field elements. The implementation does not yet take advantage of this: each multiplication in the first round is currently implemented via a 256bit Montgomery multiplication rather than a primitive 64bit multiplication. This is about half of the prover’s total work in this sumcheck invocation. (The same optimization applies for computing the three matrixvector multiplications needed to compute Az, Bz, and Cz before beginning this first sumcheck).
Anticipated speedup: 4% of overall prover time.

The optimization described in Section 3.2 of this paper by Angus Gruen applies to a subset of the invocations of the sumcheck protocol in Jolt, including the first of two sumchecks in Spartan and all sumchecks in Grand Product arguments within memorychecking procedures (Lasso, Spice).
Anticipated speedup: 3% of total prover time.

Switching the commitment scheme from Hyrax to one with much smaller commitments (e.g., HyperKZG, Zeromorph) will not only shorten the proofs, but also save the prover the time of serializing and hashing the commitments for FiatShamir. See this github issue.
Anticipated speedup: 3% of total prover time.

Make it fast to commit to slightly negative values (one group op per value) just as it's fast for small positive values.
Anticipated speedup: 2% of total prover time.

In the first sumcheck in Spartan, the prover precomputes a table of evaluations of (the multilinear extension of) the equality function eq(a, b) with the first vector a fixed to a random value. Leaving a few variables off of b and handling them differently will reduce the cost of building this table to negligible.
Anticipated speedup: 1%2% of total prover time.

On reads to registers or RAM, the value written back to the memory cell by the memorychecking procedure is committed separately from the value returned by the read, and an R1CS constraint is included to force equality. Really, a single value can be committed and the constraint omitted.
Anticipated speedup: 1% of total prover time.

SP1 implements wordaddressable memory (the CPU has to read an entire 64bit word of memory at once). Jolt currently implements byteaddressable memory (the RISCV CPU is allowed to read one byte at a time, as required by the RISCV specification).
Most benchmarks get compiled into RISCV programs that mostly read entire words at once. Switching to wordaddressable memory will improve Jolt’s speed on these benchmarks by 5%.
Total anticipated prover time reduction: 20%30%.
One way to achieve zeroknowledge is to simply compose Jolt with a zeroknowledge SNARK like Groth16. That is, use Groth16 (which is ZK) to prove that one knows a Jolt proof. Composing with Groth16 or Plonk is how most zkVMs get low onchain verification costs anyway, and it also "adds" ZK. This approach is on Jolt's roadmap, although it will take some time to complete (as it requires representing the Jolt verifier in R1CS or Plonkish, which is a pain).
There are also ways to make Jolt zeroknowledge without invoking SNARK composition. For example, rendering sumcheckbased SNARKs zeroknowledge without using composition was exactly the motivation for Zeromorph, which introduces a very efficienct zeroknowledge variant of KZG commitments for multilinear polynomials.
If we use the Zeromorph polynomial commitment scheme, the commitment and any evaluation proof are hiding (they reveal nothing about the committed polynomial, and still give the verifier a commitment to the requested evaluation of the committed polynomial). One still needs to ensure that the various applications of the sumcheck protocol in Jolt also do not leak any information about the witness. Here, techniques based on masking polynomials apply (see Section 13.3 of Proofs, Arguments, and ZeroKnowledge for a sketchy overview). However, the use of masking polynomials requires the prover to be able to commit to nonmultilinear polynomials and hence introduce significant (but surmountable) issues.
A similar approach to achieving ZK also applies when using a hashingbased polynomial commitment scheme like Brakedown. Roughly, to randomize all values sent by the prover during the Brakedown evaluation proof, it suffices to pad the committed polynomial with sufficiently many random values. One does need to lightly modify the Jolt polynomial IOP to "ignore" these extra, random committed values.
A final technique to render all of the sumchecks ZK without SNARK composition is given in Hyrax (based on old work of Cramar and Damgard). Roughly, rather than the prover sending field elements "in the clear", it instead sends (blinded, hence hiding) Pedersen commitments to these field elements. And the verifier exploits homomorphism properties to confirm that the committed field elements would have passed all of the sumcheck verifier's checks. See Section 13.2 of Proofs, Arguments, and ZeroKnowledge for additional discussion.