Batched opening proof

The final stage (stage 8) of Jolt is the batched opening proof. Over the course of the preceding stages, we obtain polynomial evaluation claims that must be proven using the polynomial commitment scheme's opening proof. Instead of proving these openings "just-in-time", we accumulate them and defer the opening proof to the last stage (see ProverOpeningAccumulator and VerifierOpeningAccumulator for how this accumulation is implemented). By waiting until the end, we can batch-prove all of the openings, instead of proving them individually. This is important because the Dory opening proof is relatively expensive for the prover, so we want to avoid doing it multiple times.

Claim reduction sumchecks

Throughout the earlier stages of Jolt, various components generate multiple polynomial evaluation claims that need to be consolidated before the final opening proof. Conceptually, claim reduction sumchecks are instantiation of the "Multiple polynomials, multiple points" subprotocol.

These claim reduction sumchecks serve two purposes:

  1. Reduce the number of claims that need to be virtualized by a subsequent sumcheck. E.g. if the same virtual polynomial is opened at two different points and , a claim reduction can be applied to avoid running two instances of the sumcheck that virtualized .
  2. Reduce the number of claims that need to be proven via PCS opening proof. While we can leverage the homomorphic properties of Dory in the Multiple polynomials, same point subprotocol, we must first reduce multiple opening points to a single, unified opening point.

The claim reduction sumchecks can be found in jolt-core/src/zkvm/claim_reductions/ and include:

  • Instruction lookups (instruction_lookups.rs): Aggregates instruction lookup claims (lookup outputs and operands) from Spartan.
  • Registers (registers.rs): Reduces register read/write claims (rd, rs1, rs2) from Spartan.
  • RAM RA (ram_ra.rs): Consolidates the four RAM read-address (RA) claims from various RAM-related sumchecks (raf evaluation, read-write checking, Val evaluation, Val-final evaluation) into a single claim for the RA virtualization sumcheck.
  • Increments (increments.rs): Reduces claims related to increment checks.
  • Hamming weight (hamming_weight.rs): Reduces hamming weight-related claims.
  • Advice (advice.rs): Reduces claims from advice polynomials.
  • Bytecode (bytecode.rs): Reduces committed bytecode openings into the shared Stage 8 final opening layout.
  • Program image (program_image.rs): Reduces the committed initial-memory image into the same final opening layout.

How claim reduction sumchecks work

A claim reduction sumcheck takes multiple polynomial evaluation claims, potentially at different evaluation points, and consolidates them into a single claim (or fewer claims) using the sumcheck protocol. The general pattern is:

  1. Input: Multiple polynomial evaluation claims of the form
  2. Batching: Random challenge is sampled from the transcript to batch the claims together
  3. Sumcheck: Prove a sumcheck identity of the form:
  4. Output: Polynomial evaluation claims of the form for a single, unified point derived from the sumcheck challenges.

Final reduction

After the claim reduction sumchecks have consolidated related claims, we perform a final reduction to prepare for the Dory opening.

We apply the Multiple polynomials, same point subprotocol to reduce the claims to a single claim, namely the evaluation of an RLCPolynomial representing a random linear combination of all the opened polynomials.

On the verifier side, this entails taking a linear combination of commitments. Since Dory is an additively homomorphic commitment scheme, the verifier is able to do so.

Precommitted polynomials and final opening layout

Most Stage 8 polynomials are trace-domain polynomials: their shape is determined by the padded execution trace and the address domain. Some committed polynomials, however, are fixed independently of the trace. Examples include bytecode chunks, the program image, and trusted or untrusted advice. In the implementation these independently committed objects are called precommitted polynomials.

The goal of Stage 8 is still the same: every committed polynomial should be opened at one batched Dory proof. The subtlety is that a precommitted polynomial keeps the Dory matrix shape it had when it was committed, while the trace-domain polynomials use Jolt's native trace layout.

We use the following terminology:

  • Native trace layout: the Dory layout induced by trace-domain polynomials, with variables.
  • Embedded precommitted polynomial: a precommitted polynomial whose committed Dory matrix fits inside the native trace layout. It is embedded by top-left zero padding.
  • Dominant precommitted polynomial: a precommitted polynomial that is larger than the native trace layout. It determines an enlarged final opening layout.
  • Final opening layout: the Dory matrix used by the single Stage 8 opening proof. This is the native trace layout when no dominant precommitted polynomial exists, and the enlarged layout otherwise.

In this section we write:

  • for the number of cycle variables, i.e. the log trace length
  • for the number of address variables, i.e. the log address-space size
  • for the number of extra variables contributed by the dominant precommitted polynomial beyond the native trace layout

With that notation, the final opening layout has

Equivalently, Stage 8 works in a Dory matrix of size where

Here is the number of row variables and is the number of column variables. This matches the implementation in DoryGlobals::balanced_sigma_nu() and the split used by PrecommittedClaimReduction::project_dory_round_permutation_for_poly().

The native trace layout is the special case . The design constraint is that dominant precommitted polynomials should not force the earlier trace-domain sumchecks to run over a padded trace. So Jolt schedules the final-opening reductions as follows:

  • the cycle stage (Stage 6b) has exactly rounds
  • the address stage (Stage 7) has exactly rounds
  • precommitted reductions are forward-loaded so they see the extra dominant coordinates
  • native trace-domain reductions are backward-loaded so their old round scheduling is preserved

This keeps trace-domain work native through the earlier stages and leaves Stage 8 with only one normalization problem: all produced opening points must be interpreted inside the final opening layout.

If some precommitted polynomial already has variables, we call it a dominant precommitted polynomial. Otherwise there is no dominant precommitted polynomial, , and the final opening layout is just the native trace layout.

How Native Polynomials Sit In The Final Layout

Native trace-domain polynomials are placed in the final opening layout according to the Dory layout. As a concrete example, take . Since Dory uses a balanced split, this means:

so the final Dory matrix has rows and columns, for a total of slots.

CycleMajor dense placement

Take a dense polynomial with variables and coefficients

In CycleMajor, the dense polynomial is written across the top of the final matrix, so only the lowest index bits vary:

Final 4 x 8 matrix

          col000   col001   col010   col011   col100   col101   col110   col111 
row00   | a_000  | a_001  | a_010  | a_011  | a_100  | a_101  | a_110  |  a_111 |
row01   |    .   |    .   |    .   |    .   |    .   |    .   |    .   |    .   |
row10   |    .   |    .   |    .   |    .   |    .   |    .   |    .   |    .   |
row11   |    .   |    .   |    .   |    .   |    .   |    .   |    .   |    .   |

so the first bits are fixed and only the last bits vary.

AddressMajor dense placement

Now take the same final opening layout , but the dense polynomial should use the highest bits. Then its coefficients are written into slots whose last bits are zero:

In the same matrix this looks like:

Final 4 x 8 matrix

          col000   col001   col010   col011   col100   col101   col110   col111
row00   | a_000  |    .   |    .   |    .   | a_001  |    .   |    .   |    .   |
row01   | a_010  |    .   |    .   |    .   | a_011  |    .   |    .   |    .   |
row10   | a_100  |    .   |    .   |    .   | a_101  |    .   |    .   |    .   |
row11   | a_110  |    .   |    .   |    .   | a_111  |    .   |    .   |    .   |

The same idea applies to one-hot polynomials:

  • in CycleMajor, they use the lowest bits
  • in AddressMajor, they use the highest bits, so the trailing bits are zero

When Address-Major Dense Stride Exceeds The Row Width

In AddressMajor, dense polynomials are embedded with stride . Sometimes that stride is larger than the number of columns of the final Dory matrix. This is the special branch handled in dory/wrappers.rs.

Take a real example:

  • final opening layout , so the balanced Dory matrix is
  • dense polynomial has variables, so it has 4 coefficients
  • therefore , so the stride is

Since the row width is only 16, consecutive coefficients jump by two whole rows:

coeff a_00 -> slot  0 -> row 0, col 0
coeff a_01 -> slot 32 -> row 2, col 0
coeff a_10 -> slot 64 -> row 4, col 0
coeff a_11 -> slot 96 -> row 6, col 0

and the matrix picture is:

8 x 16 final matrix

row0  | a_00  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . |
row1  |  .    .  .  .  .  .  .  .  .  .  .  .  .  .  .  . |
row2  | a_01  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . |
row3  |  .    .  .  .  .  .  .  .  .  .  .  .  .  .  .  . |
row4  | a_10  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . |
row5  |  .    .  .  .  .  .  .  .  .  .  .  .  .  .  .  . |
row6  | a_11  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . |
row7  |  .    .  .  .  .  .  .  .  .  .  .  .  .  .  .  . |

So the logical embedding is unchanged, but it is no longer a convenient row-local chunking. Because polynomial lengths are powers of two, the placement still stays aligned: either the stride is a multiple of the row width, so the polynomial occupies the same column range in every row it touches, or the stride divides the row width, so it stays in a fixed column but appears only in every few rows, as in the example above.

Final Dory Opening Point

In summary

  • in CycleMajor, the native dense / one-hot layout consumes the low bits of the final Dory point, so any extra precommitted variables must sit on the high side
  • in AddressMajor, the native layout consumes the high bits, so any extra precommitted variables must sit on the low side
  • each block appears in reverse because we always bind polynomials during claim reduction sumchecks from low to high bits

Now we study two cases: If there is a dominant precommitted polynomial, let the raw cycle-stage challenges be

and the raw address-stage challenges be

The final big-endian Dory opening point is obtained by normalizing these challenges into Dory order.

For AddressMajor:

For CycleMajor:

Each block is reversed, and the extra variables move to different sides depending on the layout.

If there is no dominant precommitted polynomial, then the final point is anchored by the ordinary native openings:

  • in this case the final opening layout is just the native trace layout, so
  • let be the cycle-stage opening point from IncClaimReduction
  • let be the full opening point from HammingWeightClaimReduction, decomposed as where has length

These are already normalized opening points. In CycleMajor, compute_final_opening_point() checks that the IncClaimReduction opening point agrees with the native cycle suffix . When there are no extra dominant coordinates, this means .

Then:

For AddressMajor:

For CycleMajor:

The implementation is located in compute_final_opening_point() in jolt-core/src/zkvm/mod.rs.

Embedded Precommitted Polynomials

The verifier already has the commitment to an embedded precommitted polynomial. That commitment is computed under the convention that the polynomial occupies the top-left block of its own balanced Dory matrix, meaning the earliest rows and earliest columns. So when we embed that polynomial into the final opening layout, we must preserve that same top-left placement; otherwise the verifier would be checking the Dory proof against a different layout from the one encoded in the commitment.

Final Dory matrix: 2^nu_D rows x 2^sigma_D columns
Smaller precommitted matrix: 2^nu_C rows x 2^sigma_C columns

                    left 2^sigma_C cols         remaining cols
                 +---------------------------+------------------+
top 2^nu_C rows  | smaller precommitted poly | not used by this |
                 |        lives here         |      poly        |
                 +---------------------------+------------------+
remaining rows   | not used by this poly     | not used by this |
                 |                           |      poly        |
                 +---------------------------+------------------+

Suppose the smaller precommitted polynomial has

variables, while the final point has

Split the final point as

where:

  • has length
  • has length
  • has length
  • has length

Then the smaller polynomial is evaluated on

The reason is that top-left embedding forces the missing high row bits and high column bits to be zero:

final row variables : [row_hi | row_lo]
final col variables : [col_hi | col_lo]

top-left embedding forces:
  row_hi = 0
  col_hi = 0

So if is the smaller polynomial and is its embedding into the final matrix, then

The same selector appears when the final RLCPolynomial mixes a native trace-domain polynomial with an embedded precommitted polynomial:

Permuting Precommitted Polynomial Variables

The precommitted sumchecks still bind variables low-to-high. But the final Dory point order is determined by the final opening layout, not by the order in which those rounds happen.

So Jolt permutes the variables of each precommitted polynomial before running the sumcheck. This keeps the sumcheck code simple while ensuring the final claim corresponds to the original polynomial at the correct Stage 8 point. This permutation is cheap because it is only a variable-position movement, so on the coefficient table it is just a bit permutation of the Boolean-hypercube evaluations.

Here is a concrete 3-variable example. Suppose the original polynomial is encoded by

point       000  001  010  011  100  101  110  111
P(point)     v0   v1   v2   v3   v4   v5   v6   v7

Now suppose the final opening layout wants the variables in the order rather than . Define

Then the new coefficient table becomes

point        000  001  010  011  100  101  110  111
P'(point)     v0   v4   v2   v6   v1   v5   v3   v7

because

P'(000) = P(000)
P'(001) = P(100)
P'(010) = P(010)
P'(011) = P(110)
P'(100) = P(001)
P'(101) = P(101)
P'(110) = P(011)
P'(111) = P(111)

After the sumcheck finishes, normalize_opening_point() converts the collected challenges back into the true opening point of the original, non-permuted polynomial.

RLCPolynomial

Recall that all of the polynomials in Jolt fall into one of two categories: one-hot polynomials (the and arising in Twist/Shout), and dense polynomials (we use this to mean anything that's not one-hot).

We use the RLCPolynomial struct to represent a random linear combination (RLC) of multiple polynomials, which may include both dense and one-hot polynomials. We handle the two types separately:

  • We eagerly compute the RLC of the dense polynomials. So if there are dense polynomials, each of length in the RLC, we compute the linear combination of their coefficients and store the result in the RLCPolynomial struct as a single vector of length .
  • We lazily compute the RLC of the one-hot polynomials. So if there are one-hot polynomials, we store (coefficient, reference) pairs in RLCPolynomial to represent the RLC. Later in the Dory opening proof when we need to compute a vector-matrix product using RLCPolynomial, we do so by computing the vector-matrix product using the individual one-hot polynomials (as well as the dense RLC) and taking the linear combination of the resulting vectors.