Jolt Alpha

Jolt is a zkVM framework built around the Lasso lookup argument.

Jolt powers succinct proofs of execution of programs written in any high-level language. Jolt's sumcheck-based SNARK makes extensive use of multivariate polynomials and commitment schemes. Jolt zkVMs have state-of-the-art prover performance and have substantial room for growth over the coming decades.

Jolt zkVMs have a simple programming model, requiring only 50-100 LOC to implement new VM instructions.

The Jolt codebase currently targets the RISC-V instruction set which is supported by most high-level language compilers, but the code is intended to be extensible and usable by any ISA.

The only property of the ISA that Jolt requires is that each primitive instruction is "decomposable". This means that evaluating the instruction on one or two 32-bit or 64-bit inputs can be done via a procedure of the following form:

  • decompose each input into, say, 8-bit chunks
  • apply one or more specified functions to these chunks (one from each input)
  • reconstruct the output of the original instruction from the outputs of the functions operating on the chunks

Background reading

Credits

Lasso was written by Srinath Setty, Justin Thaler and Riad Wahby. Jolt was written by Arasu Arun, Srinath Setty, and Justin Thaler.

Jolt was initially forked from Srinath Setty's work on microsoft/Spartan, specifically the arkworks-rs/Spartan fork in order to use the excellent Arkworks-rs prime field arithmetic library. Jolt's R1CS is also proven using a version of Spartan (forked from the microsoft/Spartan2 codebase) optimized to the case of uniform R1CS constraints.

Using Jolt

Using Jolt is simple thanks to our tooling and macros. To get started, head over to our quickstart guide. For more in depth information check out the guests and hosts section.

Quickstart

Installing

Start by installing the jolt command line tool.

cargo +nightly install --git https://github.com/a16z/jolt --force --bins jolt

Creating a Project

To create a project, use the jolt new command.

jolt new <PROJECT_NAME>

This will create a template Jolt project in the specified location. Make sure to cd into the project before running the next step.

cd <PROJECT_NAME>

Project Tour

The main folder contains the host, which is the code that can generate and verify proofs. Within this main folder, there is another package called guest which contains the Rust functions that we can prove from the host. For more information about these concepts refer to the guests and hosts section.

We'll start by taking a look at our guest. We can view the guest code in guest/src/lib.rs.


#![allow(unused)]
#![cfg_attr(feature = "guest", no_std)]
#![no_main]

fn main() {
#[jolt::provable]
fn fib(n: u32) -> u128 {
    let mut a: u128 = 0;
    let mut b: u128 = 1;
    let mut sum: u128;
    for _ in 1..n {
        sum = a + b;
        a = b;
        b = sum;
    }
    b
}
}

As we can see, this implements a simple Fibonacci function called fib. All we need to do to make our function provable is add the jolt::provable macro above it.

Next let's take a look at the host code in src/main.rs.

pub fn main() {
    let (prove_fib, verify_fib) = guest::build_fib();

    let (output, proof) = prove_fib(50);
    let is_valid = verify_fib(proof);

    println!("output: {}", output);
    println!("valid: {}", is_valid);
}

This section simply imports guest::build_fib which is automatically generated by the jolt::provable macro, and returns functions for proving and verifying our function. The prove function takes the same inputs as the original fib function, but modifies the outputs to additionally return a proof. The verify function can then be used to check this proof, and return a boolean indicating its validity.

Running

Let's now run the host with cargo.

cargo run --release

This will compile the guest, perform some required preprocessing, and execute the host code which proves and verifies the 50th Fibonacci number. This preprocessing is run within the build_fib function and adds significant time to running the host, but only needs to be performed once. This means that we could use the prove method many times without rerunning build_fib. In the future we will support caching this across runs of the host.

Installing

Start by installing the jolt command line tool.

cargo +nightly install --git https://github.com/a16z/jolt --force --bins jolt

Guests and Hosts

Guests and hosts are the main concepts that we need to understand in order to use Jolt. Guests contain the functions that Jolt will prove, while hosts are where we write more general code and invoke the prover and verifier for our guest functions.

Guests

Guests contain functions for Jolt to prove. Currently, these functions must be written using no_std Rust. If you have a function that does not require the standard library, making it provable is as easy as ensuring it is inside the guest package and adding the jolt::provable macro above it.

Let's take a look at a simple guest program to better understand it.


#![allow(unused)]
#![cfg_attr(feature = "guest", no_std)]
#![no_main]

fn main() {
#[jolt::provable]
fn add(x: u32, y: u32) -> u32 {
    x + y
}
}

As we can see, the guest looks like a normal no_std Rust library. The only major change is the addition of the jolt::provable macro, which lets Jolt know of the function's existence. Other than no_std, the only requirement of these functions is that its inputs are serializable and outputs are deserializable with serde. Fortunately serde is prevalent throughout the Rust ecosystem, so most types will support it by default.

There is no requirement that just a single function lives within the guest, and we are free to add as many as we need. Additionally, we can import any no_std compatible library just as we normally would in Rust.


#![allow(unused)]
#![cfg_attr(feature = "guest", no_std)]
#![no_main]

fn main() {
use sha2::{Sha256, Digest};
use sha3::{Keccak256, Digest};

#[jolt::provable]
fn sha2(input: &[u8]) -> [u8; 32] {
    let mut hasher = Sha256::new();
    hasher.update(input);
    let result = hasher.finalize();
    Into::<[u8; 32]>::into(result)
}

#[jolt::provable]
fn sha3(input: &[u8]) -> [u8; 32] {
    let mut hasher = Keccak256::new();
    hasher.update(input);
    let result = hasher.finalize();
    Into::<[u8; 32]>::into(result)
}
}

Hosts

Hosts are where we can invoke the Jolt prover to prove functions defined within the guest. Hosts do not have the no_std requirement, and are free to use the Rust standard library.

The host imports the guest package, and will have automatically generated functions to build each of the Jolt functions. For the sha2 and sha3 example guest we looked at in the guest section, these functions would be called build_sha2 and build_sha3 respectively. Each returns two results, a prover function and a verifier function. The prover function takes in the same input types as the original function and modifies the output to additionally include a proof. The verifier can then take this proof and verify it.

pub fn main() {
    let (prove_sha2, verify_sha2) = guest::build_sha2();
    let (prove_sha3, verify_sha3) = guest::build_sha3();

    let input = &[5u8; 32];

    let (output, proof) = prove_sha2(input);
    let is_valid = verify_sha2(proof);

    println!("sha2 output: {}", output);
    println!("sha2 valid: {}", is_valid);

    let (output, proof) = prove_sha3(input);
    let is_valid = verify_sha3(proof);

    println!("sha3 output: {}", output);
    println!("sha3 valid: {}", is_valid);
}

Allocators

Jolt provides an allocator which supports most containers such as Vec and Box. This is useful for Jolt users who would like to write no_std code rather than using Jolt's standard library support. To use these containers, they must be explicitly imported from alloc. The alloc crate is automatically provided by rust and does not need to added to the Cargo.toml file.

Example


#![allow(unused)]
#![cfg_attr(feature = "guest", no_std)]
#![no_main]

fn main() {
extern crate alloc;
use alloc::vec::Vec;

#[jolt::provable]
fn alloc(n: u32) -> u32 {
    let mut v = Vec::<u32>::new();
    for i in 0..100 {
        v.push(i);
    }

    v[n as usize]
}
}

Standard Library

Jolt supports the full Rust standard library. Do enable support, simply add the guest-std feature to the Jolt import in the guest's Cargo.toml file and remove the #![cfg_attr(feature = "guest", no_std)] directive from the guest code.

Example


#![allow(unused)]
fn main() {
[package]
name = "guest"
version = "0.1.0"
edition = "2021"

[[bin]]
name = "guest"
path = "./src/lib.rs"

[features]
guest = []

[dependencies]
jolt = { package = "jolt-sdk", git = "https://github.com/a16z/jolt", features = ["guest-std"] }
}

#![allow(unused)]
#![no_main]

fn main() {
#[jolt::provable]
fn int_to_string(n: i32) -> String {
    n.to_string()
}
}

WASM Support

Jolt supports WebAssembly compatibility for the verification side of provable functions. This allows developers to run proof verification in WASM environments such as web browsers. Note that currently, only the verification process is supported in WASM; the proving process is not WASM-compatible at this time.

Creating a WASM-Compatible Project

To create a new project with WASM verification support, use the following command:

jolt new <PROJECT_NAME> --wasm

This creates a new project with the necessary structure and configuration for WASM-compatible verification.

Marking Functions for WASM Verification

For functions whose verification process should be WASM-compatible, extend the #[jolt::provable] macro with the wasm attribute:


#![allow(unused)]
fn main() {
#[jolt::provable(wasm)]
fn my_wasm_verifiable_function() {
    // function implementation
}
}

Building for WASM Verification

To compile your project for WASM verification, use the following command:

jolt build-wasm

This command performs several actions:

  1. It extracts all functions marked with #[jolt::provable(wasm) from your guest/src/lib.rs file.
  2. For each WASM-verifiable function, it preprocesses the function and saves the necessary verification data.
  3. It creates an index.html file as an example of how to use your WASM-compiled verification functions in a web environment.
  4. It uses wasm-pack to build your project, targeting web environments.

Important: The build process only compiles the verification functions for WASM. The proving process must still be performed outside of the WASM environment.

Adding Dependencies

When adding new dependencies for WASM-compatible projects, note that they must be added to both guest/Cargo.toml and the root Cargo.toml. The build-wasm process will automatically add necessary WASM-related dependencies to your project.

Switching Between WASM and Normal Compilation

The build process modifies the Cargo.toml files to enable WASM compilation. If you need to switch back to normal compilation, you may need to manually adjust these files. In the root Cargo.toml, you can remove the following lines:

[lib]
crate-type = ["cdylib"]
path = "guest/src/lib.rs"

Remember to restore these lines before attempting another WASM build.

Example Project Structure

After running jolt build-wasm, your project will include:

  • An index.html file in the root directory, providing a basic interface to verify proofs for each of your WASM-verifiable functions.
  • A pkg directory containing the WASM-compiled version of your project's verification functions.
  • Preprocessed data files for each WASM-verifiable function in the target/wasm32-unknown-unknown/release/ directory.

You can use this example as a starting point and customize it to fit your specific requirements.

Example: Modifying the Quickstart Project

You can use the example from the Quickstart chapter and modify the /src/main.rs file as follows:

pub fn main() {
    let (prove_fib, _verify_fib) = guest::build_fib();

    let (_output, proof) = prove_fib(50);

    proof
        .save_to_file("proof.bin")
        .expect("Failed to save proof to file");
}

Remember to modify the Cargo.toml as described before for normal compilation, and then you can generate and save the proof with the following command:

cargo run -r

This will create a proof.bin file in the root directory. You can then verify this proof using the example index.html. Before doing this, change the crate type back to cdylib in the Cargo.toml and ensure that your /guest/src/lib.rs looks like this:


#![allow(unused)]
#![cfg_attr(feature = "guest", no_std)]
#![no_main]

fn main() {
#[jolt::provable(wasm)]
fn fib(n: u32) -> u128 {
    let mut a: u128 = 0;
    let mut b: u128 = 1;
    let mut sum: u128;
    for _ in 1..n {
        sum = a + b;
        a = b;
        b = sum;
    }

    b
}
}

Then run jolt build-wasm. After successful compilation, you can verify the proof using the index.html file. Start a local server from the project directory, for example with:

npx http-server .

Note: Make sure you have npx installed to use the http-server command.

Troubleshooting

Insufficient Memory or Stack Size

Jolt provides reasonable defaults for the total allocated memory and stack size. It is however possible that the defaults are not sufficient, leading to unpredictable errors within our tracer. To fix this we can try to increase these sizes. We suggest starting with the stack size first as this is much more likely to run out.

Below is an example of manually specifying both the total memory and stack size.


#![allow(unused)]
#![cfg_attr(feature = "guest", no_std)]
#![no_main]

fn main() {
extern crate alloc;
use alloc::vec::Vec;

#[jolt::provable(stack_size = 10000, memory_size = 10000000)]
fn waste_memory(size: u32, n: u32) {
    let mut v = Vec::new();
    for i in 0..size {
        vec.push(i);
    }
}
}

Maximum Input or Output Size Exceeded

Jolt restricts the size of the inputs and outputs to 4096 bytes by default. Using inputs and outputs that exceed this size will lead to errors. These values can be configured via the macro.


#![allow(unused)]
#![cfg_attr(feature = "guest", no_std)]
#![no_main]

fn main() {
#[jolt::provable(max_input_size = 10000, max_output_size = 10000)]
fn sum(input: &[u8]) -> u32 {
    let mut sum = 0;
    for value in input {
        sum += value;
    }

    sum
}
}

Guest Attempts to Compile Standard Library

Sometimes after installing the toolchain the guest still tries to compile with the standard library which will fail with a large number of errors that certain items such as Result are referenced and not available. This generally happens when one tries to run jolt before installing the toolchain. To address, try rerunning jolt install-toolchain, restarting your terminal, and delete both your rust target directory and any files under /tmp that begin with jolt.

Getting Help

If none of the above help, please serialize your program and send it along with a detailed bug report.

Serializing a call to the "fib" function in the Jolt guest:


#![allow(unused)]
fn main() {
// let (prove_fib, verify_fib) = guest::build_fib();
let program_summary = guest::analyze_fib(10);
program_summary.write_to_file("fib_10.txt".into()).expect("should write");
}

Contributors

How it works

Jolt

This section gives an overview of the core components of Jolt.

Jolt's four components

A VM does two things:

  • Repeatedly execute the fetch-decode-execute logic of its instruction set architecture.
  • Perform reads and writes to Random Access Memory (RAM).

The Jolt paper depicts these two tasks mapped to three components in the final Jolt proof:

Jolt Alpha

The Jolt codebase is similarly organized, but instead separates read-write memory (comprising registers and RAM) from program code (aka bytecode, which is read-only), for a total of four components:

fetch-decode-execute

Read-write memory

To handle reads/writes to RAM (and registers) Jolt uses a memory checking argument from Spice, which is closely related to Lasso itself. They are both based on "offline memory checking" techniques, the main difference being that Lasso supports read-only memory while Spice supports read-write memory, making it slightly more expensive. This is implemented in read_write_memory.rs.

For more details: Read-write memory

R1CS constraints

To handle the "fetch" part of the fetch-decode-execute loop, there is a minimal R1CS instance (about 60 constraints per cycle of the RISC-V VM). These constraints handle program counter (PC) updates and serves as the "glue" enforcing consistency between polynomials used in the components below. Jolt uses Spartan, optimized for the highly-structured nature of the constraint system (e.g., the R1CS constraint matrices are block-diagonal with blocks of size only about 60 x 80). This is implemented in jolt-core/src/r1cs.

For more details: R1CS constraints

Instruction lookups

To handle the "execute" part of the fetch-decode-execute loop, Jolt invokes the Lasso lookup argument. The lookup argument maps every instruction (including its operands) to its output. This is implemented in instruction_lookups.rs.

For more details: Instruction lookups

Bytecode

To handle the "decode" part of the fetch-decode-execute loop, Jolt uses another instance of (read-only) offline memory-checking, similar to Lasso. The bytecode of the guest program is "decoded" in preprocessing, and the prover subsequently invokes offline memory-checking on the sequence of reads from this decoded bytecode corresponding to the execution trace being proven. This is implemented in bytecode.rs.

For more details: Bytecode

Instruction lookups

In Jolt, the "execute" part of the "fetch-decode-execute" loop is handled using a lookup argument (Lasso). At a high level, the lookup queries are the instruction operands, and the lookup outputs are the instruction outputs.

Lasso

Lasso is a lookup argument (equivalent to a SNARK for reads into a read-only memory). Lookup arguments allow the prover to convince the verifier that for a committed vector of values , committed vector of indices , and lookup table , for all .

Lasso is a special lookup argument with highly desirable asymptotic costs largely correlated to the number of lookups (the length of the vectors and ), rather than the length of of the table .

A conversational background on lookups can be found here. In short: lookups are great for zkVMs as they allow constant cost / developer complexity for the prover algorithm per VM instruction.

A detailed engineering overview of Lasso (as a standalone lookup argument) can be found here.

Using Lasso to handle instruction execution

Lasso requires that each primitive instruction satisfies a decomposability property. The property needed is that the input(s) to the instruction can be broken into "chunks" (say, with each chunk consisting of 16 bits), such that one can obtain the answer to the original instruction by evaluating a simple function on each chunk and then "collating" the results together. For example, the bitwise-OR of two 32-bit inputs x and y can be computed by breaking each input up into 8-bit chunks, XORing each 8-bit chunk of x with the associated chunk of y, and concatenating the results together.

In Lasso, we call the task of evaluating a simple function on each chunk a "subtable lookup" (the relevant lookup table being the table storing all evaluations of the simple function). And the "collating" of the results of the subtable lookups into the result of the original lookup (instruction execution on the un-chunked inputs) is handled via an invocation of the sum-check protocol. We call this the "primary sumcheck" instance in Lasso.

The "primary sumcheck" collation looks as follows for a trace of length and a VM with unique instructions.

is an indicator function for the -th instruction: if instruction is used during the -th step of the trace, when .

is the collation function used by the -th instruction.

where is the number of independent memories used by an instruction. For simple instructions like the EQ instruction, , where is the number of chunks that the inputs are broken into, so . More complicated instructions such LT might have . The exact layout is dependent on the number of subtables required by the decomposition of the instruction. The mappings can be found in the JoltInstruction::subtable method implementations.

Mental Model

For a given (think integer index of the instruction within the trace), for all but one term of the outer sum. Similarly all for all but one term of the inner sum. Leaving just the collation function of a single instruction, evaluating to the collated lookup output of the single instruction. In reality is a random point selected by the verifier over the course of the protocol. The evaluation point provides a distance amplified encoding of the entire trace of instructions.

To illustrate more concretely imagine a two-instruction VM for LT and EQ instructions with .

Subtable Flags

TODO

  • We then use memory checking to determine that each of the memories is well formed
  • At a given step of the CPU only a single instruction will be used, that means that only that instruction's subtables will be used. For the rest of the memories we insert a no_op with (a, v) = 0. In order to make the GKR trees cheaper to compute and sumcheck we'll add a single additional layer to the GKR tree. During this layer we'll "toggle" each of the GKR leaves to "1" in the case that it is an unused step of the CPU. This will make the binary tree of multiplication gates cheaper. We'll toggle based on a new flags polynomial called which is the sum of all of the used in the instruction collation above.
  • The function to compute each of the leaves becomes

Read-write memory (registers and RAM)

Jolt proves the validity of registers and RAM using offline memory checking. In contrast to our usage of offline memory checking in other modules, registers and RAM are writable memory.

Memory layout

For the purpose of offline memory checking, Jolt treats registers, program inputs/outputs, and RAM as occupying one unified address space. This remapped address space is laid out as follows:

Memory layout

The zero-padding depicted above is sized so that RAM starts at a power-of-two offset (this is explained below). As noted in the diagram, the size of the witness scales with the highest memory addressed over the course of the program's execution. In addition to the zero-padding between the "Program I/O" and "RAM" sections, the end of the witness is zero-padded to a power of two.

Handling program I/O

Inputs

Program inputs and outputs (and the panic bit, which indicates whether the proram panicked) live in the same memory address space as RAM. Program inputs populate the designated input space upon initialization: init memory

The verifier can efficiently compute the MLE of this initial memory state on its own (i.e. in time proportional to the IO size, not the total memory size).

Ouputs and panic

On the other hand, the verifier cannot compute the MLE of the final memory state on its own –– though the program I/O is known to the verifier, the final memory state contains values written to registers/RAM over the course of program execution, which are not known to the verifier.

The verifier is, however, able to compute the MLE of the program I/O values (padded on both sides with zeros) –– this is denoted v_io below. If the prover is honest, then the final memory state (v_final below) should agree with v_io at the indices corresponding to program I/O.

final memory

To enforce this, we invoke the sumcheck protocol to perform a "zero-check" on the difference between v_final and v_io:

final memory

This also motivates the zero-padding between the "Program I/O" and "RAM" sections of the witness. The zero-padding ensures that both input_start and ram_witness_offset are powers of two, which makes it easier for the verifier to compute the MLEs of v_init and v_io.

Timestamp range check

Registers and RAM are writable memory, which introduces additional requirements compared to offline memory checking in a read-only context.

The multiset equality check for read-only memory, typically expressed as , is not adequate for ensuring the accuracy of read values. It is essential to also verify that each read operation retrieves a value that was written in a previous step (not a future step). (Here, denotes the tuples capturing initialization of memory and the tuples capturing all of the writes to memory following initialization. denotes the tuples capturing all read operations, and denotes tuples capturing a final read pass over all memory cells).

To formalize this, we assert that the timestamp of each read operation, denoted as , must not exceed the global timestamp at that particular step. The global timestamp starts at 0 and is incremented once per step.

The verification of is equivalent to confirming that falls within the range and that the difference is also within the same range.

The process of ensuring that both and lie within the specified range is known as range-checking. This is the procedure implemented in timestamp_range_check.rs, using a modified version of Lasso.

Intuitively, checking that each read timestamp does not exceed the global timestamp prevents an attacker from answering all read operations to a given cell with "the right set of values, but out of order". Such an attack requires the attacker to "jump forward and backward in time". That is, for this attack to succeed, at some timestamp t when the cell is read, the attacker would have to return a value that will be written to that cell in the future (and at some later timestamp t' when the same cell is read the attacker would have to return a value that was written to that cell much earlier). This attack is prevented by confirming that all values returned have a timestamp that does not exceed the current global timestamp.

Bytecode

Jolt proves the validity of registers and RAM using offline memory checking.

Decoding an ELF file

The tracer iterates over the .text sections of the program ELF file and decode RISC-V instructions. Each instruction gets mapped to a BytecodeRow struct:


#![allow(unused)]
fn main() {
pub struct BytecodeRow {
    /// Memory address as read from the ELF.
    address: usize,
    /// Packed instruction/circuit flags, used for r1cs
    bitflags: u64,
    /// Index of the destination register for this instruction (0 if register is unused).
    rd: u64,
    /// Index of the first source register for this instruction (0 if register is unused).
    rs1: u64,
    /// Index of the second source register for this instruction (0 if register is unused).
    rs2: u64,
    /// "Immediate" value for this instruction (0 if unused).
    imm: u64,
}
}

The registers (rd, rs1, rs2) and imm are described in the RISC-V spec. The bitflags field is described in greater detail below.

The preprocessed bytecode serves as the (read-only) "memory" over which we perform offline memory checking.

bytecode_trace

We currently assume the program bytecode is sufficiently short and known by the verifier, so the prover doesn't need to provide a commitment to it –– the verifier can compute the MLE of the bytecode on its own. This guarantees that the bytecode trace is consistent with the agreed-upon program.

Instead of tuples, each step in the bytecode trace is mapped to a tuple . Otherwise, the offline memory checking proceeds as described in Offline Memory Checking.

Bitflags

The bitflags of a given instruction is the concatenation of its circuit flags and instruction flags. This concatenation is enforced by R1CS constraints.

bitflags

R1CS constraints

Jolt usees R1CS constraints to enforce certain rules of the RISC-V fetch-decode-execute loop and to ensure consistency between the proofs for the different modules of Jolt (instruction lookups, read-write memory, and bytecode).

Uniformity

Jolt's R1CS is uniform, which means that the constraint matrices for an entire program are just repeated copies of the constraint matrices for a single CPU step. Each step is conceptually simple and involves around 60 constraints and 80 variables.

Input Variables and constraints

The inputs required for the constraint system for a single CPU step are:

Pertaining to bytecode

  • Program counters (PCs): this is the only state passed between CPU steps.
  • Bytecode read address: the address in the program code read at this step.
  • The preprocessed ("5-tuple") representation of the instruction: (bitflags, rs1, rs2, rd, imm).

Pertaining to read-write memory

  • The (starting) RAM address read by the instruction: if the instruction is not a load/store, this is 0.
  • The bytes written to or read from memory.

Pertaining to instruction lookups

  • The chunks of the instruction's operands x and y.
  • The chunks of the lookup query. These are typically some combination of the operand chunks (e.g. the i-th chunk of the lookup query is often the concatenation of x_i and y_i).
  • The lookup output.

Circuit and instruction flags:

  • There are nine circuit flags used to guide the constraints and are dependent only on the opcode of the instruction. These are thus stored as part of the preprocessed bytecode in Jolt.
    1. operand_x_flag: 0 if the first operand is the value in rs1 or the PC.
    2. operand_y_flag: 0 if the second operand is the value in rs2 or the imm.
    3. is_load_instr
    4. is_store_instr
    5. is_jump_instr
    6. is_branch_instr
    7. if_update_rd_with_lookup_output: 1 if the lookup output is to be stored in rd at the end of the step.
    8. sign_imm_flag: used in load/store and branch instructions where the instruction is added as constraints.
    9. is_concat: indicates whether the instruction performs a concat-type lookup.
  • Instruction flags: these are the unary bits used to indicate instruction is executed at a given step. There are as many per step as the number of unique instruction lookup tables in Jolt, which is 19.

Constraint system

The constraints for a CPU step are detailed in the get_jolt_matrices() function in constraints.rs.

Reusing commitments

As with most SNARK backends, Spartan requires computing a commitment to the inputs to the constraint system. A catch (and an optimization feature) in Jolt is that most of the inputs are also used as inputs to proofs in the other modules. For example, the address and values pertaining to the bytecode are used in the bytecode memory-checking proof, and the lookup chunks, output and flags are used in the instruction lookup proof. For Jolt to be sound, it must be ensured that the same inputs are fed to all relevant proofs. We do this by re-using the commitments themselves. This can be seen in the format_commitments() function in the r1cs/snark module. Spartan is adapted to take pre-committed witness variables.

Exploiting uniformity

The uniformity of the constraint system allows us to heavily optimize both the prover and verifier. The main changes involved in making this happen are:

  • Spartan is modified to only take in the constraint matrices a single step, and the total number of steps. Using this, the prover and verifier can efficiently calculate the multilinear extensions of the full R1CS matrices.
  • The commitment format of the witness values is changed to reflect uniformity. All versions of a variable corresponding to each time step is committed together. This affects nearly all variables committed to in Jolt.
  • The inputs and witnesses are provided to the constraint system as segments.
  • Additional constraints are used to enforce consistency of the state transferred between CPU steps.

These changes and their impact on the code are visible in spartan.rs.

Background

If you're interested in learning the nitty gritty details of SNARKs including questions like 'why', 'is it sound', you should read Proofs and Args ZK by Thaler. If you're simply interested in 'how', this section is for you.

The Sum-check Protocol

Adapted from the Thaler13 exposition. For a detailed introduction to sumcheck see Chapter 4.1 of the textbook.

Suppose we are given a -variate polynomial defined over a finite field . The purpose of the sumcheck protocol is to compute the sum:

In order to execute the protocol, the verifier needs to be able to evaluate for a randomly chosen vector . Hence, from the verifier's perspective, the sum-check protocol reduces the task of summing 's evaluations over inputs (namely, all inputs in ) to the task of evaluating at a single input in .

The protocol proceeds in rounds as follows. In the first round, the prover sends a polynomial , and claims that

Observe that if is as claimed, then . Also observe that the polynomial has degree , the degree of variable in . Hence can be specified with field elements. In our implementation, will specify by sending the evaluation of at each point in the set . (Actually, the prover does not need to send , since since the verifier can infer that , as if this were not the case, the verifier would reject).

Then, in round , chooses a value uniformly at random from and sends to . We will often refer to this step by saying that variable gets bound to value . In return, the prover sends a polynomial , and claims that

The verifier compares the two most recent polynomials by checking that , and rejecting otherwise. The verifier also rejects if the degree of is too high: each should have degree , the degree of variable in .

In the final round, the prover has sent which is claimed to be . now checks that (recall that we assumed can evaluate at this point). If this test succeeds, and so do all previous tests, then the verifier accepts, and is convinced that .

Multilinear Extensions

For any -variate polynomial polynomial, it's multilinear extension is the polynomial which agrees over all points : . By the Schwartz-Zippel lemma, if and disagree at even a single input, then and must disagree almost everywhere.

For more precise details please read Section 3.5 of Proofs and Args ZK.

Engineering

In practice, MLE's are stored as the vector of evaluations over the -variate boolean hypercube . There are two important algorithms over multilinear extensions: single variable binding, and evaluation.

Single Variable Binding

With a single streaming pass over all evaluations we can "bind" a single variable of the -variate multilinear extension to a point . This is a critical sub-algorithm in sumcheck. During the binding the number of evaluation points used to represent the MLE gets reduced by a factor of 2:

  • Before:
  • After:

Assuming your MLE is represented as a 1-D vector of evaluations over the -variate boolean hypercube , indexed little-endian

Then we can transform the vector representing the transformation from by "binding" the evaluations vector to a point .

let n = 2^v;
let half = n / 2;
for i in 0..half {
    let low = E[i];
    let high = E[half + i];
    E[i] = low + r * (high - low);
}

Multi Variable Binding

Another common algorithm is to take the MLE and compute its evaluation at a single -variate point outside the boolean hypercube . This algorithm can be performed in time by preforming the single variable binding algorithm times. The time spent on 'th variable binding is , so the total time across all bindings is proportional to .

Eq Extension

The multilinear extension (MLE) is useful throughout the protocol. It is the MLE of the function defined as follows:

has the following explicit formula:

Batched Openings

At the end of the sumcheck protocol, the verifier must evaluate many multilinear polynomials at a single point. For polynomials which the verifier cannot compute on their own, they depend on the verification of a PCS opening proof provided by the prover. To save on verifier costs, all polynomials opened at the same point can be combined to a single opening proof.

The best reading on the subject can be found in Section 16.1 of the Textbook.

Offline Memory Checking

Offline memory checking is a method that enables a prover to demonstrate to a verifier that a read/write memory was used correctly. In such a memory system, values can be written to addresses and subsequently retrieved. This technique allows the verifier to efficiently confirm that the prover adhered to the memory's rules (i.e., that the value returned by any read operation is indeed the most recent value that was written to that memory cell).

Jolt utilizes offline memory checking in the Bytecode prover, Lookup prover (for VM instruction execution), and RAM prover. The RAM prover must support a read-write memory. The Bytecode prover and Lookup prover need only support read-only memories. In the case of the Bytecode prover, the memory is initialized to contain the Bytecode of the RISC-V program, and the memory is never modified (it's read-only). And the lookup tables used for VM instruction execution are determined entirely by the RISC-V instruction set.

(The term "offline memory checking" refers to techniques that check the correctness of all read operations "all at once", after the reads have all occurred--or in SNARK settings, after the purported values returned by the reads have been committed. Off-line checking techniques do not determine as a read happens whether or not it was correct. They only ascertain, when all the reads are checked at once, whether or not all of the reads were correct.

This is in contrast to "online memory checking" techniques like Merkle hashing that immediately confirm that a memory read was done correctly by insisting that each read includes an authentication path. Merkle hashing is much more expensive on a per-read basis for SNARK provers, and offline memory checking suffices for SNARK design. This is why Lasso and Jolt use offline memory checking techniques rather than online).

Initialization Algorithm

TODO:

  • Initialize four timestamp counters
  • Implicitly assume a read operation before each write

Multiset Check

Define and as subsets, and and as subsets: Here, , , and represent the address, value, and timestamp respectively, with being the total number of memory operations and the size of the RAM.

The verifier checks that the combination of and matches and , disregarding the sequence of elements, known as a permutation check: Jolt conducts this check using a homomorphic hash function applied to each set: The hash function is defined as:

This multiset hashing process is represented by a binary tree of multiplication gates and is computed using an optimized GKR protocol.

References

GKR

GKR is a SNARK protocol for small-depth circuits of multiplication and addition gates. The standard form allows combinations of both. The protocol involves three MLEs: is the MLE of a function capturing the values of the gates at layer , and and are MLEs of functions capturing the circuit's wires.

Specifically, evaluates to the value of the circuit at the -th layer in the -th gate. For example corresponds to the output gate.

evaluates to 1 if the -th gate of the -th layer is an addition gate whose inputs are the 'th and 'th gates at the preceding layer.

evaluates to 1 if the -th gate of the -th layer is a multiplication gate whose inputs are the 'th and 'th gates at the preceding layer.

The sumcheck protocol is applied to the following:

where

(See the eq-polynomial page for details.)

Lasso and Jolt currently use the variant of GKR described in Thaler13, which is optimized for the far simpler case of a binary tree of multiplication gates. This simplifies each sumcheck to:

where

GKR is utilized in memory-checking for the multi-set permutation check.

Binius

Binius was written by Ben Diamond and Jim Posen of Irreducible (fka Ulvetanna). It's a new commitment scheme that allows Jolt to use smaller fields more efficiently.

The Binius paper also gives a number of sum-check-based polynomial IOPs for to be combined with the commitment scheme. For Jolt's purposes, these will yield extremely efficient protocols for proving hash evaluations with various standard hash functions like Keccak (important for recursing Jolt-with-Binius-commitment), and for the RISC-V addition and multiplication operations.

Here is a brief summary of changes that will occur in Jolt in order to incorporate the Binius commitment scheme:

*Addition and multiplication instructions will be handled via gadgets/constraint-systems instead of lookups. We may want to handle XOR and LT via gadgets too for performance reasons.

*Booleanity constraints in the R1CS can be omitted as that is guaranteed by the binding properties of the Binius commitment scheme.

*In offline memory checking (both Lasso for lookups a.k.a. read-only memory, and Spice for read-write memory), access counts need to be incremented. The Binius paper (Section 4.4) proposes to do this by sticking counts in the exponent of a generator of an appropriate multiplicative subgroup of , but it seems more efficient to plan to do the count increments via the addition gadget.

For Spice, the count written during a write operation is set to the maximum of a global counter and the count returned by the preceding read operation, plus 1. If the prover is honest, this written count will always equal the global counter plus 1. This max can be implemented by having the prover commit to the binary representation of and using the addition gadget to confirm that .

*Constraints that involve addition by IMM, the immediate value, will need to implement addition via gadget.

*In the "collation" sum-check of Lasso that collates results of sub-table lookups into the result of the original big-table lookup, the algebraic implementation of "concatenate the results into one field element" changes. Over a large prime-order field, concatenating two -bit values and corresponds to . Over constructed via towering, gets replaced with an appropriate element of the tower basis. The same change affects various R1CS constraints ensuring that certain committed values are a limb decomposition of another committed value.

*Many performance optimizations become available within the various invocations of the sum-check protocol, leveraging that has small characteristic and most values being summed are in for small .

Multiplicative Generator

Finding a Multiplicative Generator for

To find a generator of the multiplicative group , which includes all nonzero elements of the field , a probabilistic method is used. This method iteratively tests random elements to determine if they are generators of the group. The process is summarized below.

Probabilistic Method Algo

while true:
    # sample random nonzero element x
    for each maximal proper divisor d of |𝔽*|:
        if xᵈ == 1: continue
    return x

This algorithm ensures that the element does not reside in any proper subgroup of . The condition checked within the loop, , ensures that has the maximal possible order, which is necessary for it to be a generator of the group.

Background:

  • Group Structure: is the multiplicative group of the field , containing all field elements except 0. Its order is , which is prime to 2.
  • Maximal Proper Divisors: These divisors are crucial as testing powers for these values ensures that is not a member of any smaller cyclic subgroup. The divisors of are used to verify the generator property.
  • Probabilistic Success Rate: The likelihood of a random element being a generator is significant, given the structure of the group. This can be supported by the [[chinese-remainder-theorem]], which suggests that the intersection of several smaller groups (subgroups corresponding to divisors) is likely non-trivial only when all subgroup conditions are simultaneously satisfied.

Example Divisors:

  • For ,
  • For ,
  • For ,

Maximal Proper Divisors

A maximal proper divisor of a number is a divisor of which is neither nor itself, and there are no other divisors of that divide except and . Essentially, is a divisor that is not a multiple of any smaller divisor other than and itself, making it 'maximal' under the set of proper divisors.

Algorithm for Finding

The algorithm to find the maximal proper divisors of a given number involves identifying all divisors of and then selecting those which do not have other divisors besides and themselves. The steps are as follows:

  1. Find All Divisors: First, list all divisors of by checking for every integer from to if divides . If divides , then both and are divisors.
  2. Filter Maximal Divisors: From this list of divisors, exclude and . For each remaining divisor, check if it can be expressed as a multiple of any other divisor from the list (other than and itself). If it cannot, then it is a maximal proper divisor.
function findMaximalProperDivisors(n):
    divisors = []
    for i from 1 to sqrt(n):
        if n % i == 0:
            divisors.append(i)
            if i != n / i:
                divisors.append(n / i)
    divisors.remove(1)
    divisors.remove(n)

    maximal_proper_divisors = []
    for d in divisors:
        is_maximal = true
        for e in divisors:
            if d != e and e != 1 and d % e == 0:
                is_maximal = false
                break
        if is_maximal:
            maximal_proper_divisors.append(d)

    return maximal_proper_divisors

Dev

For more information on setting up your development environment, see Installation Guide.

For details on the tools used in the Jolt development process, refer to Development Tools.

Development Installation

Rust

  • curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  • Rustup should automatically install Rust toolchain and necessary targets on the first cargo invocation. If you need to add the RISC-V target for building guest programs manually use rustup target add riscv32i-unknown-none-elf.

mdBook

For building this book:

cargo install mdbook mdbook-katex

For watching the changes in your local browser:

mdbook watch book --open

Development Tools

Tracing

Jolt is instrumented using tokio-rs/tracing. These traces can be displayed using the --format chrome flag, for example: cargo run -p jolt-core --release -- trace --name sha2-chain --format chrome

After tracing, files can be found in the workspace root with a name trace-<timestamp>.json. Load these traces into perfetto.

Tracing in Jolt

Often it's easiest to debug performance for a particular segment by adding granular tracing, adjusting code, rerunning the sha2-chain end-to-end benchmark and looking through the Chrome traces.

Objdump

Debugging the emulator / tracer can be hard. Use riscv64-unknown-elf-objdump to compare the actual ELF to the .bytecode / .jolttrace files.

All of the open tasks are tracked via Github Issues.

Jolt Skill Tree

Below are known optimizations that we will be implementing in the coming weeks, and the anticipated reduction in prover costs (all percentages are relative to the Jolt prover's speed on initial public release in April 2024).

  • The way we implemented Lasso leads to grand product arguments for which about 90% of the factors being multiplied together are equal to 1. The current implementation explicitly stores all these values (assigning 256 bits to each value). Instead, we can store a densified representation of them (i.e., list only the values that are not 1, and assume all the others are 1). This will speed up the prover, but more importantly it will reduce total prover space usage by a factor of about 8x.

    Anticipated speedup: 8% of prover time (and a significant space reduction).

  • When Spartan is applied to the R1CS arising in Jolt, the prover’s work in the first round involves computation over 64-bit integers, not arbitrary field elements. The implementation does not yet take advantage of this: each multiplication in the first round is currently implemented via a 256-bit Montgomery multiplication rather than a primitive 64-bit multiplication. This is about half of the prover’s total work in this sum-check invocation. (The same optimization applies for computing the three matrix-vector multiplications needed to compute Az, Bz, and Cz before beginning this first sum-check).

    Anticipated speedup: 4% of overall prover time.

  • The optimization described in Section 3.2 of this paper by Angus Gruen applies to a subset of the invocations of the sum-check protocol in Jolt, including the first of two sum-checks in Spartan and all sum-checks in Grand Product arguments within memory-checking procedures (Lasso, Spice).

    Anticipated speedup: 3% of total prover time.

  • Switching the commitment scheme from Hyrax to one with much smaller commitments (e.g., HyperKZG, Zeromorph) will not only shorten the proofs, but also save the prover the time of serializing and hashing the commitments for Fiat-Shamir. See this github issue.

    Anticipated speedup: 3% of total prover time.

  • Make it fast to commit to slightly negative values (one group op per value) just as it's fast for small positive values.

    Anticipated speedup: 2% of total prover time.

  • In the first sum-check in Spartan, the prover pre-computes a table of evaluations of (the multilinear extension of) the equality function eq(a, b) with the first vector a fixed to a random value. Leaving a few variables off of b and handling them differently will reduce the cost of building this table to negligible.

    Anticipated speedup: 1%-2% of total prover time.

  • On reads to registers or RAM, the value written back to the memory cell by the memory-checking procedure is committed separately from the value returned by the read, and an R1CS constraint is included to force equality. Really, a single value can be committed and the constraint omitted.

    Anticipated speedup: 1% of total prover time.

  • SP1 implements word-addressable memory (the CPU has to read an entire 64-bit word of memory at once). Jolt currently implements byte-addressable memory (the RISC-V CPU is allowed to read one byte at a time, as required by the RISC-V specification).

    Most benchmarks get compiled into RISC-V programs that mostly read entire words at once. Switching to word-addressable memory will improve Jolt’s speed on these benchmarks by 5%.

Total anticipated prover time reduction: 20%-30%.

One way to achieve zero-knowledge is to simply compose Jolt with a zero-knowledge SNARK like Groth16. That is, use Groth16 (which is ZK) to prove that one knows a Jolt proof. Composing with Groth16 or Plonk is how most zkVMs get low on-chain verification costs anyway, and it also "adds" ZK. This approach is on Jolt's roadmap, although it will take some time to complete (as it requires representing the Jolt verifier in R1CS or Plonkish, which is a pain). 

There are also ways to make Jolt zero-knowledge without invoking SNARK composition. For example, rendering sum-check-based SNARKs zero-knowledge without using composition was exactly the motivation for Zeromorph, which introduces a very efficienct zero-knowledge variant of KZG commitments for multilinear polynomials.

If we use the Zeromorph polynomial commitment scheme, the commitment and any evaluation proof are hiding (they reveal nothing about the committed polynomial, and still give the verifier a commitment to the requested evaluation of the committed polynomial). One still needs to ensure that the various applications of the sum-check protocol in Jolt also do not leak any information about the witness. Here, techniques based on masking polynomials apply (see Section 13.3 of Proofs, Arguments, and Zero-Knowledge for a sketchy overview). However, the use of masking polynomials requires the prover to be able to commit to non-multilinear polynomials and hence introduce significant (but surmountable) issues.

A similar approach to achieving ZK also applies when using a hashing-based polynomial commitment scheme like Brakedown. Roughly, to randomize all values sent by the prover during the Brakedown evaluation proof, it suffices to pad the committed polynomial with sufficiently many random values. One does need to lightly modify the Jolt polynomial IOP to "ignore" these extra, random committed values.

A final technique to render all of the sum-checks ZK without SNARK composition is given in Hyrax (based on old work of Cramar and Damgard). Roughly, rather than the prover sending field elements "in the clear", it instead sends (blinded, hence hiding) Pedersen commitments to these field elements. And the verifier exploits homomorphism properties to confirm that the committed field elements would have passed all of the sum-check verifier's checks. See Section 13.2 of Proofs, Arguments, and Zero-Knowledge for additional discussion.

Groth16 Recursion

Jolt's verifier today is expensive. We estimate 1-5 million gas to verify within the EVM. Better batching techniques on opening proofs can bring this down 5-10x, but it will remain expensive. Further, the short-term continuations plan causes a linear blowup depending on the count of monolithic trace chunks proven.

To solve these two issues we're aiming to add a configuration option to the Jolt prover with a post processing step, which creates a Groth16 proof of the Jolt verifier for constant proof size / cost (~280k gas on EVM) regardless of continuation chunk count or opening proof cost. This technique is industry standard.

Strategy

The easiest way to understand the workload of the verifier circuit is to jump through the codebase starting at vm/mod.rs Jolt::verify(...). Verification can be split into 4 logical modules: instruction lookups, read-write memory, bytecode, r1cs.

Each of the modules do some combination of the following:

After recursively verifying sumcheck, the verifier needs to compare the claimed evaluation of the sumcheck operand at a random point : to their own evaluation of the polynomial at . Jolt does this with a combination of opening proofs over the constituent polynomials of and direct evaluations of the multi-linear extensions of those polynomials if they have sufficient structure.

Specifics

Polynomial opening proof verification

HyperKZG is currently the optimal commitment scheme for recursion due to the requirement of only 2-pairing operations per opening proof. Unfortunately non-native field arithmetic will always be expensive within a circuit.

There are two options:

  • Sumcheck and MLE evaluations using native arithmetic, pairing operations using non-native arithmetic
  • Sumcheck and MLE evaluations using non-native arithmetic, pairing operations using native arithmetic

We believe the latter is more efficient albeit unergonomic. Some of the details are worked out in this paper here.

Polynomial opening proof batching

Jolt requires tens of opening proofs across all constituent polynomials in all sumchecks. If we did these independently the verifier would be prohibitively expensive. Instead we batch all opening proofs for polynomials which share an evaluation point .

verify_instruction_lookups

Instruction lookups does two sumchecks described in more detail here. The first contains some complexity. The evaluation of the MLE of each of the instructions at the point spit out by sumcheck is computed directly by the verifier. The verifier is able to do this thanks to the property from Lasso that each table is SOS (decomposable).

The LassoSubtable trait is implemented for all subtables. LassoSubtable::evaluate_mle(r) computes the MLE of each subtable. The JoltInstruction trait combines a series of underlying subtables. The MLEs of these subtables are combined to an instruction MLE via JoltInstruction::combine_lookups(vals: &[F]). Finally each of the instruction MLEs are combined into a VM-wide lookup MLE via InstructionLookupsProof::combine_lookups(...).

The Groth16 verifier circuit would have to mimic this pattern. Implementing the MLE evaluation logic for each of the subtables, combination logic for each of the instructions, and combination logic to aggregate all instructions. It's possible that subtables / instructions will be added / removed in the future.

verify_r1cs

R1CS is a modified Spartan instance which runs two sumchecks and a single opening proof.

There are two difficult MLEs to evaluate:

  • – evaluations of the R1CS coefficient
  • – evaluation of the witness vector

The sections below are under-described in the wiki. We'll flush these out shortly. Assume this step comes last.

For we must leverage the uniformity to efficiently evaluate. This is under-described in the wiki, but we'll get to it ASAP.

The witness vector is comprised of all of the inputs to the R1CS circuit concatenated together in trace_length-sized chunks. All of these are committed independently and are checked via a batched opening proof.

Engineering Suggestions

The Jolt codebase is rapidly undergoing improvements to reduce prover and verifier costs as well as simplify abstractions. As a result, it's recommended that each section above be built in modules that are convenient to rewire. Each part should be incrementally testable and adjustable.

A concrete example of changes to expect: we currently require 5-10 opening proofs per Jolt execution. Even for HyperKZG this requires 10-20 pairings which is prohibitively expensive. We are working on fixing this via better batching.

Suggested plan of attack:

  1. Circuit for HyperKZG verification
  2. Circuit for batched HyperKZG verification
  3. Circuit for sumcheck verification
  4. Circuit for instruction/subtable MLE evaluations
  5. Circuit for Spartan verification

Precompiles

Precompiles are highly optimized SNARK gadgets which can be invoked from the high-level programming language of the VM user. These gadgets can be much more efficient for the prover than compiling down to the underlying ISA by exploiting the structure of the workload. In practice zkVMs use these for heavy cryptographic operations such as hash functions, signatures and other elliptic curve arithmetic.

By popular demand, Jolt will support these gadgets as well. The short term plan is to optimize for minimizing Jolt-core development resources rather than optimal prover speed.

Precompile support plan:

  1. RV32 library wrapping syscalls of supported libraries
  2. Tracer picks up syscalls, sets relevant flag bits and loads memory accordingly
  3. Individual (uniform) Spartan instance for each precompile, repeated over trace_length steps
  4. Jolt config includes which precompiles are supported (there is some non-zero prover / verifier cost to including an unused precompile)
  5. Survey existing hash / elliptic curve arithmetic R1CS arithmetizations. Prioritize efficiency and audits.
  6. Use circom-scotia to convert matrices into static files in the Jolt codebase
  7. Write a converter to uniformly repeat the constraints trace_length steps

TODO(sragss): How do we deal with memory and loading more than 64-bits of inputs to precompiles.

Continuations via Chunking

Today Jolt is a monolithic SNARK. RISC-V traces cannot be broken up, they must be proven monolithically or not at all. As a result, Jolt has a fixed maximum trace length that can be proved which is a function of the available RAM on the prover machine. Long term we'd like to solve this by implementing a streaming version of Jolt's prover such that prover RAM usage is tunable with minimal performance loss. TODO(sragss): Streaming prover link

Short term we're going to solve this via monolithic chunking. The plan: Take a trace of length split it into chunks of size and prove each independently. is a function of the max RAM available to the prover.

For the direct on-chain verifier there will be a cost linear in to verify. If we wrap this in Groth16 our cost will become constant but the Groth16 prover time will be linear in . We believe this short-term trade-off is worthwhile for usability until we can implement the (more complicated) streaming algorithms.

Specifics

A generic config parameter will be added to the Jolt struct called ContinuationConfig. At the highest level, before calling Jolt::prove the trace will be split into M chunks. Jolt::prove will be called on each and return RAM_final which can be fed into RAM_init during the next iteration of Jolt::prove. The output zerocheck will only be run for the final chunk.