Introduction

Miden VM is a zero-knowledge virtual machine written in Rust. For any program executed on Miden VM, a STARK-based proof of execution is automatically generated. This proof can then be used by anyone to verify that the program was executed correctly without the need for re-executing the program or even knowing the contents of the program.

Status and features

Miden VM is currently on release v0.8. In this release, most of the core features of the VM have been stabilized, and most of the STARK proof generation has been implemented. While we expect to keep making changes to the VM internals, the external interfaces should remain relatively stable, and we will do our best to minimize the amount of breaking changes going forward.

At this point, Miden VM is good enough for experimentation, and even for real-world applications, but it is not yet ready for production use. The codebase has not been audited and contains known and unknown bugs and security flaws.

Feature highlights

Miden VM is a fully-featured virtual machine. Despite being optimized for zero-knowledge proof generation, it provides all the features one would expect from a regular VM. To highlight a few:

  • Flow control. Miden VM is Turing-complete and supports familiar flow control structures such as conditional statements and counter/condition-controlled loops. There are no restrictions on the maximum number of loop iterations or the depth of control flow logic.
  • Procedures. Miden assembly programs can be broken into subroutines called procedures. This improves code modularity and helps reduce the size of Miden VM programs.
  • Execution contexts. Miden VM program execution can span multiple isolated contexts, each with its own dedicated memory space. The contexts are separated into the root context and user contexts. The root context can be accessed from user contexts via customizable kernel calls.
  • Memory. Miden VM supports read-write random-access memory. Procedures can reserve portions of global memory for easier management of local variables.
  • u32 operations. Miden VM supports native operations with 32-bit unsigned integers. This includes basic arithmetic, comparison, and bitwise operations.
  • Cryptographic operations. Miden assembly provides built-in instructions for computing hashes and verifying Merkle paths. These instructions use Rescue Prime Optimized hash function (which is the native hash function of the VM).
  • External libraries. Miden VM supports compiling programs against pre-defined libraries. The VM ships with one such library: Miden stdlib which adds support for such things as 64-bit unsigned integers. Developers can build other similar libraries to extend the VM's functionality in ways which fit their use cases.
  • Nondeterminism. Unlike traditional virtual machines, Miden VM supports nondeterministic programming. This means a prover may do additional work outside of the VM and then provide execution hints to the VM. These hints can be used to dramatically speed up certain types of computations, as well as to supply secret inputs to the VM.
  • Customizable hosts. Miden VM can be instantiated with user-defined hosts. These hosts are used to supply external data to the VM during execution/proof generation (via nondeterministic inputs) and can connect the VM to arbitrary data sources (e.g., a database or RPC calls).

Planned features

In the coming months we plan to finalize the design of the VM and implement support for the following features:

  • Recursive proofs. Miden VM will soon be able to verify a proof of its own execution. This will enable infinitely recursive proofs, an extremely useful tool for real-world applications.
  • Better debugging. Miden VM will provide a better debugging experience including the ability to place breakpoints, better source mapping, and more complete program analysis info.
  • Faulty execution. Miden VM will support generating proofs for programs with faulty execution (a notoriously complex task in ZK context). That is, it will be possible to prove that execution of some program resulted in an error.

Structure of this document

This document is meant to provide an in-depth description of Miden VM. It is organized as follows:

  • In the introduction, we provide a high-level overview of Miden VM and describe how to run simple programs.
  • In the user documentation section, we provide developer-focused documentation useful to those who want to develop on Miden VM or build compilers from higher-level languages to Miden assembly (the native language of Miden VM).
  • In the design section, we provide in-depth descriptions of the VM's internals, including all AIR constraints for the proving system. We also provide the rationale for settling on specific design choices.
  • Finally, in the background material section, we provide references to materials which could be useful for learning more about STARKs - the proving system behind Miden VM.

License

Licensed under the MIT license.

Miden VM overview

Miden VM is a stack machine. The base data type of the MV is a field element in a 64-bit prime field defined by modulus . This means that all values that the VM operates with are field elements in this field (i.e., values between and , both inclusive).

Miden VM consists of four high-level components as illustrated below.

These components are:

  • Stack which is a push-down stack where each item is a field element. Most assembly instructions operate with values located on the stack. The stack can grow up to items deep, however, only the top 16 items are directly accessible.
  • Memory which is a linear random-access read-write memory. The memory is word-addressable, meaning, four elements are located at each address, and we can read and write elements to/from memory in batches of four. Memory addresses can be in the range .
  • Chiplets which are specialized circuits for accelerating certain types of computations. These include Rescue Prime Optimized (RPO) hash function, 32-bit binary operations, and 16-bit range checks.
  • Host which is a way for the prover to communicate with the VM during runtime. This includes responding to the VM's requests for non-deterministic inputs and handling messages sent by the VM (e.g., for debugging purposes). The requests for non-deterministic inputs are handled by the host's advice provider.

Miden VM comes with a default implementation of the host interface (with an in-memory advice provider). However, the users are able to provide their own implementations which can connect the VM to arbitrary data sources (e.g., a database or RPC calls) and define custom logic for handling events emitted by the VM.

Writing programs

Our goal is to make Miden VM an easy compilation target for high-level languages such as Rust, Move, Sway, and others. We believe it is important to let people write programs in the languages of their choice. However, compilers to help with this have not been developed yet. Thus, for now, the primary way to write programs for Miden VM is to use Miden assembly.

While writing programs in assembly is far from ideal, Miden assembly does make this task a little bit easier by supporting high-level flow control structures and named procedures.

Inputs and outputs

External inputs can be provided to Miden VM in two ways:

  1. Public inputs can be supplied to the VM by initializing the stack with desired values before a program starts executing. Any number of stack items can be initialized in this way, but providing a large number of public inputs will increase the cost for the verifier.
  2. Secret (or nondeterministic) inputs can be supplied to the VM via the advice provider. There is no limit on how much data the advice provider can hold.

After a program finishes executing, the elements remaining on the stack become the outputs of the program. Since these outputs will be public inputs for the verifier, having a large stack at the end of execution will increase cost to the verifier. Therefore, it's best to drop unneeded output values. We've provided the truncate_stack utility function in the standard library for this purpose.

The number of public inputs and outputs of a program can be reduced by making use of the advice stack and Merkle trees. Just 4 elements are sufficient to represent a root of a Merkle tree, which can be expanded into an arbitrary number of values.

For example, if we wanted to provide a thousand public input values to the VM, we could put these values into a Merkle tree, initialize the stack with the root of this tree, initialize the advice provider with the tree itself, and then retrieve values from the tree during program execution using mtree_get instruction (described here).

Stack depth restrictions

For reasons explained here, the VM imposes the restriction that the stack depth cannot be smaller than . This has the following effects:

  • When initializing a program with fewer than inputs, the VM will pad the stack with zeros to ensure the depth is at the beginning of execution.
  • If an operation would result in the stack depth dropping below , the VM will insert a zero at the deep end of the stack to make sure the depth stays at .

Nondeterministic inputs

The advice provider component is responsible for supplying nondeterministic inputs to the VM. These inputs only need to be known to the prover (i.e., they do not need to be shared with the verifier).

The advice provider consists of three components:

  • Advice stack which is a one-dimensional array of field elements. Being a stack, the VM can either push new elements onto the advice stack, or pop the elements from its top.
  • Advice map which is a key-value map where keys are words and values are vectors of field elements. The VM can copy values from the advice map onto the advice stack as well as insert new values into the advice map (e.g., from a region of memory).
  • Merkle store which contain structured data reducible to Merkle paths. Some examples of such structures are: Merkle tree, Sparse Merkle Tree, and a collection of Merkle paths. The VM can request Merkle paths from the Merkle store, as well as mutate it by updating or merging nodes contained in the store.

The prover initializes the advice provider prior to executing a program, and from that point on the advice provider is manipulated solely by executing operations on the VM.

Usage

Before you can use Miden VM, you'll need to make sure you have Rust installed. Miden VM v0.8 requires Rust version 1.75 or later.

Miden VM consists of several crates, each of which exposes a small set of functionality. The most notable of these crates are:

  • miden-processor, which can be used to execute Miden VM programs.
  • miden-prover, which can be used to execute Miden VM programs and generate proofs of their execution.
  • miden-verifier, which can be used to verify proofs of program execution generated by Miden VM prover.

The above functionality is also exposed via the single miden-vm crate, which also provides a CLI interface for interacting with Miden VM.

CLI interface

Compiling Miden VM

To compile Miden VM into a binary, we have a Makefile with the following tasks:

make exec

This will place an optimized, multi-threaded miden executable into the ./target/optimized directory. It is equivalent to executing:

cargo build --profile optimized --features concurrent,executable

If you would like to enable single-threaded mode, you can compile Miden VM using the following command:

cargo build --profile optimized --features executable

For a faster build, you can compile with less optimizations, replacing --profile optimized by --release. Example:

cargo build --release --features concurrent,executable

In this case, the miden executable will be placed in the ./target/release directory.

Controlling parallelism

Internally, Miden VM uses rayon for parallel computations. To control the number of threads used to generate a STARK proof, you can use RAYON_NUM_THREADS environment variable.

GPU acceleration

Miden VM proof generation can be accelerated via GPUs. Currently, GPU acceleration is enabled only on Apple silicon hardware (via Metal). To compile Miden VM with Metal acceleration enabled, you can run the following command:

make exec-metal

Similar to make exec command, this will place the resulting miden executable into the ./target/optimized directory.

Currently, GPU acceleration is applicable only to recursive proofs which can be generated using the -r flag.

SIMD acceleration

Miden VM execution and proof generation can be accelerated via vectorized instructions. Currently, SIMD acceleration can be enabled on platforms supporting SVE and AVX2 instructions.

To compile Miden VM with AVX2 acceleration enabled, you can run the following command:

make exec-avx2

To compile Miden VM with SVE acceleration enabled, you can run the following command:

make exec-sve

This will place the resulting miden executable into the ./target/optimized directory.

Similar to Metal acceleration, SVE/AVX2 acceleration is currently applicable only to recursive proofs which can be generated using the -r flag.

Running Miden VM

Once the executable has been compiled, you can run Miden VM like so:

./target/optimized/miden [subcommand] [parameters]

Currently, Miden VM can be executed with the following subcommands:

  • run - this will execute a Miden assembly program and output the result, but will not generate a proof of execution.
  • prove - this will execute a Miden assembly program, and will also generate a STARK proof of execution.
  • verify - this will verify a previously generated proof of execution for a given program.
  • compile - this will compile a Miden assembly program (i.e., build a program MAST) and outputs stats about the compilation process.
  • debug - this will instantiate a Miden debugger against the specified Miden assembly program and inputs.
  • analyze - this will run a Miden assembly program against specific inputs and will output stats about its execution.
  • repl - this will initiate the Miden REPL tool.
  • example - this will execute a Miden assembly example program, generate a STARK proof of execution and verify it. Currently it is possible to run blake3 and fibonacci examples.

All of the above subcommands require various parameters to be provided. To get more detailed help on what is needed for a given subcommand, you can run the following:

./target/optimized/miden [subcommand] --help

For example:

./target/optimized/miden prove --help

To execute a program using the Miden VM there needs to be a .masm file containing the Miden Assembly code and a .inputs file containing the inputs.

Enabling logging

You can use MIDEN_LOG environment variable to control how much logging output the VM produces. For example:

MIDEN_LOG=trace ./target/optimized/miden [subcommand] [parameters]

If the level is not specified, warn level is set as default.

Inputs

As described here the Miden VM can consume public and secret inputs.

  • Public inputs:
    • operand_stack - can be supplied to the VM to initialize the stack with the desired values before a program starts executing. There is no limit on the number of stack inputs that can be initialized in this way, although increasing the number of public inputs increases the cost to the verifier.
  • Secret (or nondeterministic) inputs:
    • advice_stack - can be supplied to the VM. There is no limit on how much data the advice provider can hold. This is provided as a string array where each string entry represents a field element.
    • advice_map - is supplied as a map of 64-character hex keys, each mapped to an array of numbers. The hex keys are interpreted as 4 field elements and the arrays of numbers are interpreted as arrays of field elements.
    • merkle_store - the Merkle store is container that allows the user to define merkle_tree, sparse_merkle_tree and partial_merkle_tree data structures.
      • merkle_tree - is supplied as an array of 64-character hex values where each value represents a leaf (4 elements) in the tree.
      • sparse_merkle_tree - is supplied as an array of tuples of the form (number, 64-character hex string). The number represents the leaf index and the hex string represents the leaf value (4 elements).
      • partial_merkle_tree - is supplied as an array of tuples of the form ((number, number), 64-character hex string). The internal tuple represents the leaf depth and index at this depth, and the hex string represents the leaf value (4 elements).

Check out the comparison example to see how secret inputs work.

After a program finishes executing, the elements that remain on the stack become the outputs of the program, along with the overflow addresses (overflow_addrs) that are required to reconstruct the stack overflow table.

Fibonacci example

In the miden/examples/fib directory, we provide a very simple Fibonacci calculator example. This example computes the 1001st term of the Fibonacci sequence. You can execute this example on Miden VM like so:

./target/optimized/miden run -a miden/examples/fib/fib.masm -n 1

This will run the example code to completion and will output the top element remaining on the stack.

If you want the output of the program in a file, you can use the --output or -o flag and specify the path to the output file. For example:

./target/optimized/miden run -a miden/examples/fib/fib.masm -o fib.out

This will dump the output of the program into the fib.out file. The output file will contain the state of the stack at the end of the program execution.

Performance

The benchmarks below should be viewed only as a rough guide for expected future performance. The reasons for this are twofold:

  1. Not all constraints have been implemented yet, and we expect that there will be some slowdown once constraint evaluation is completed.
  2. Many optimizations have not been applied yet, and we expect that there will be some speedup once we dedicate some time to performance optimizations.

Overall, we don't expect the benchmarks to change significantly, but there will definitely be some deviation from the below numbers in the future.

A few general notes on performance:

  • Execution time is dominated by proof generation time. In fact, the time needed to run the program is usually under 1% of the time needed to generate the proof.
  • Proof verification time is really fast. In most cases it is under 1 ms, but sometimes gets as high as 2 ms or 3 ms.
  • Proof generation process is dynamically adjustable. In general, there is a trade-off between execution time, proof size, and security level (i.e. for a given security level, we can reduce proof size by increasing execution time, up to a point).
  • Both proof generation and proof verification times are greatly influenced by the hash function used in the STARK protocol. In the benchmarks below, we use BLAKE3, which is a really fast hash function.

Single-core prover performance

When executed on a single CPU core, the current version of Miden VM operates at around 20 - 25 KHz. In the benchmarks below, the VM executes a Fibonacci calculator program on Apple M1 Pro CPU in a single thread. The generated proofs have a target security level of 96 bits.

VM cyclesExecution timeProving timeRAM consumedProof size
2101 ms60 ms20 MB46 KB
2122 ms180 ms52 MB56 KB
2148 ms680 ms240 MB65 KB
21628 ms2.7 sec950 MB75 KB
21881 ms11.4 sec3.7 GB87 KB
220310 ms47.5 sec14 GB100 KB

As can be seen from the above, proving time roughly doubles with every doubling in the number of cycles, but proof size grows much slower.

We can also generate proofs at a higher security level. The cost of doing so is roughly doubling of proving time and roughly 40% increase in proof size. In the benchmarks below, the same Fibonacci calculator program was executed on Apple M1 Pro CPU at 128-bit target security level:

VM cyclesExecution timeProving timeRAM consumedProof size
2101 ms120 ms30 MB61 KB
2122 ms460 ms106 MB77 KB
2148 ms1.4 sec500 MB90 KB
21627 ms4.9 sec2.0 GB103 KB
21881 ms20.1 sec8.0 GB121 KB
220310 ms90.3 sec20.0 GB138 KB

Multi-core prover performance

STARK proof generation is massively parallelizable. Thus, by taking advantage of multiple CPU cores we can dramatically reduce proof generation time. For example, when executed on an 8-core CPU (Apple M1 Pro), the current version of Miden VM operates at around 100 KHz. And when executed on a 64-core CPU (Amazon Graviton 3), the VM operates at around 250 KHz.

In the benchmarks below, the VM executes the same Fibonacci calculator program for 220 cycles at 96-bit target security level:

MachineExecution timeProving timeExecution %Implied Frequency
Apple M1 Pro (16 threads)310 ms7.0 sec4.2%140 KHz
Apple M2 Max (16 threads)280 ms5.8 sec4.5%170 KHz
AMD Ryzen 9 5950X (16 threads)270 ms10.0 sec2.6%100 KHz
Amazon Graviton 3 (64 threads)330 ms3.6 sec8.5%265 KHz

Recursive proofs

Proofs in the above benchmarks are generated using BLAKE3 hash function. While this hash function is very fast, it is not very efficient to execute in Miden VM. Thus, proofs generated using BLAKE3 are not well-suited for recursive proof verification. To support efficient recursive proofs, we need to use an arithmetization-friendly hash function. Miden VM natively supports Rescue Prime Optimized (RPO), which is one such hash function. One of the downsides of arithmetization-friendly hash functions is that they are considerably slower than regular hash functions.

In the benchmarks below we execute the same Fibonacci calculator program for 220 cycles at 96-bit target security level using RPO hash function instead of BLAKE3:

MachineExecution timeProving timeProving time (HW)
Apple M1 Pro (16 threads)310 ms94.3 sec42.0 sec
Apple M2 Max (16 threads)280 ms75.1 sec20.9 sec
AMD Ryzen 9 5950X (16 threads)270 ms59.3 sec
Amazon Graviton 3 (64 threads)330 ms21.7 sec14.9 sec

In the above, proof generation on some platforms can be hardware-accelerated. Specifically:

  • On Apple M1/M2 platforms the built-in GPU is used for a part of proof generation process.
  • On the Graviton platform, SVE vector extension is used to accelerate RPO computations.

Development Tools and Resources

The following tools are available for interacting with Miden VM:

The following resources are available to help you get started programming with Miden VM more quickly:

Miden Debugger

The Miden debugger is a command-line interface (CLI) application, inspired by GNU gdb, which allows debugging of Miden assembly (MASM) programs. The debugger allows the user to step through the execution of the program, both forward and backward, either per clock cycle tick, or via breakpoints.

The Miden debugger supports the following commands:

CommandShortcutArgumentsDescription
nextncount?Steps count clock cycles. Will step 1 cycle of count is omitted.
continuec-Executes the program until completion, failure or a breakpoint.
backbcount?Backward step count clock cycles. Will back-step 1 cycle of count is omitted.
rewindr-Executes the program backwards until the beginning, failure or a breakpoint.
printp-Displays the complete state of the virtual machine.
print memp maddress?Displays the memory value at address. If address is omitted, didisplays all the memory values.
print stackp sindex?Displays the stack value at index. If index is omitted, displays all the stack values.
clockc-Displays the current clock cycle.
quitq-Quits the debugger.
helph-Displays the help message.

In order to start debugging, the user should provide a MASM program:

cargo run --features executable -- debug --assembly miden/examples/nprime/nprime.masm

The expected output is:

============================================================
Debug program
============================================================
Reading program file `miden/examples/nprime/nprime.masm`
Compiling program... done (16 ms)
Debugging program with hash 11dbbddff27e26e48be3198133df8cbed6c5875d0fb
606c9f037c7893fde4118...
Reading input file `miden/examples/nprime/nprime.inputs`
Welcome! Enter `h` for help.
>>

In order to add a breakpoint, the user should insert a breakpoint instruction into the MASM file. This will generate a Noop operation that will be decorated with the debug break configuration. This is a provisory solution until the source mapping is implemented.

The following example will halt on the third instruction of foo:

proc.foo
    dup
    dup.2
    breakpoint
    swap
    add.1
end

begin
    exec.foo
end

Miden REPL

The Miden Read–eval–print loop (REPL) is a Miden shell that allows for quick and easy debugging of Miden assembly. After the REPL gets initialized, you can execute any Miden instruction, undo executed instructions, check the state of the stack and memory at a given point, and do many other useful things! When the REPL is exited, a history.txt file is saved. One thing to note is that all the REPL native commands start with an ! to differentiate them from regular assembly instructions.

Miden REPL can be started via the CLI repl command like so:

./target/optimized/miden repl

It is also possible to initialize REPL with libraries. To create it with Miden standard library you need to specify -s or --stdlib subcommand, it is also possible to add a third-party library by specifying -l or --libraries subcommand with paths to .masl library files. For example:

./target/optimized/miden repl -s -l example/library.masl

Miden assembly instruction

All Miden instructions mentioned in the Miden Assembly sections are valid. One can either input instructions one by one or multiple instructions in one input.

For example, the below two commands will result in the same output.

>> push.1
>> push.2
>> push.3
push.1 push.2 push.3

To execute a control flow operation, one must write the entire statement in a single line with spaces between individual operations.

repeat.20
    pow2
end

The above example should be written as follows in the REPL tool:

repeat.20 pow2 end

!help

The !help command prints out all the available commands in the REPL tool.

!program

The !program command prints out the entire Miden program being executed. E.g., in the below scenario:

>> push.1.2.3.4
>> repeat.16 pow2 end
>> u32wrapping_add

>> !program
begin
    push.1.2.3.4
    repeat.16 pow2 end
    u32wrapping_add
end

!stack

The !stack command prints out the state of the stack at the last executed instruction. Since the stack always contains at least 16 elements, 16 or more elements will be printed out (even if all of them are zeros).

>> push.1 push.2 push.3 push.4 push.5
>> exp
>> u32wrapping_mul
>> swap
>> eq.2
>> assert

The !stack command will print out the following state of the stack:

>> !stack
3072 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

!mem

The !mem command prints out the contents of all initialized memory locations. For each such location, the address, along with its memory values, is printed. Recall that four elements are stored at each memory address.

If the memory has at least one value that has been initialized:

>> !mem
7: [1, 2, 0, 3]
8: [5, 7, 3, 32]
9: [9, 10, 2, 0]

If the memory is not yet been initialized:

>> !mem
The memory has not been initialized yet

!mem[addr]

The !mem[addr] command prints out memory contents at the address specified by addr.

If the addr has been initialized:

>> !mem[9]
9: [9, 10, 2, 0]

If the addr has not been initialized:

>> !mem[87]
Memory at address 87 is empty

!use

The !use command prints out the list of all modules available for import.

If the stdlib was added to the available libraries list !use command will print all its modules:

>> !use
Modules available for importing:
std::collections::mmr
std::collections::smt
...
std::mem
std::sys
std::utils

Using the !use command with a module name will add the specified module to the program imports:

>> !use std::math::u64

>> !program
use.std::math::u64

begin

end

!undo

The !undo command reverts to the previous state of the stack and memory by dropping off the last executed assembly instruction from the program. One could use !undo as often as they want to restore the state of a stack and memory instructions ago (provided there are instructions in the program). The !undo command will result in an error if no remaining instructions are left in the Miden program.

>> push.1 push.2 push.3
>> push.4
>> !stack
4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0

>> push.5
>> !stack
5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0

>> !undo
4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0

>> !undo
3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0

User Documentation

In the following sections, we provide developer-focused documentation useful to those who want to develop on Miden VM or build compilers from higher-level languages to Miden VM.

This documentation consists of two high-level sections:

  • Miden assembly which provides a detailed description of Miden assembly language, which is the native language of Miden VM.
  • Miden Standard Library which provides descriptions of all procedures available in Miden Standard Library.

For info on how to run programs on Miden VM, please refer to the usage section in the introduction.

Miden Assembly

Miden assembly is a simple, low-level language for writing programs for Miden VM. It stands just above raw Miden VM instruction set, and in fact, many instructions of Miden assembly map directly to raw instructions of Miden VM.

Before Miden assembly can be executed on Miden VM, it needs to be compiled into a Program MAST (Merkelized Abstract Syntax Tree) which is a binary tree of code blocks each containing raw Miden VM instructions.

assembly_to_VM

As compared to raw Miden VM instructions, Miden assembly has several advantages:

  1. Miden assembly is intended to be a more stable external interface for the VM. That is, while we plan to make significant changes to the underlying VM to optimize it for stability, performance etc., we intend to make very few breaking changes to Miden assembly.
  2. Miden assembly natively supports control flow expressions which the assembler automatically transforms into a program MAST. This greatly simplifies writing programs with complex execution logic.
  3. Miden assembly supports macro instructions. These instructions expand into short sequences of raw Miden VM instructions making it easier to encode common operations.
  4. Miden assembly supports procedures. These are stand-alone blocks of code which the assembler inlines into program MAST at compile time. This improves program modularity and code organization.

The last two points also make Miden assembly much more concise as compared to the raw program MAST. This may be important in the blockchain context where pubic programs need to be stored on chain.

Terms and notations

In this document we use the following terms and notations:

  • is the modulus of the VM's base field which is equal to .
  • A binary value means a field element which is either or .
  • Inequality comparisons are assumed to be performed on integer representations of field elements in the range .

Throughout this document, we use lower-case letters to refer to individual field elements (e.g., ). Sometimes it is convenient to describe operations over groups of elements. For these purposes we define a word to be a group of four elements. We use upper-case letters to refer to words (e.g., ). To refer to individual elements within a word, we use numerical subscripts. For example, is the first element of word , is the last element of word , etc.

Design goals

The design of Miden assembly tries to achieve the following goals:

  1. Miden assembly should be an easy compilation target for high-level languages.
  2. Programs written in Miden assembly should be readable, even if the code is generated by a compiler from a high-level language.
  3. Control flow should be easy to understand to help in manual inspection, formal verification, and optimization.
  4. Compilation of Miden assembly into Miden program MAST should be as straight-forward as possible.
  5. Serialization of Miden assembly into a binary representation should be as compact and as straight-forward as possible.

In order to achieve the first goal, Miden assembly exposes a set of native operations over 32-bit integers and supports linear read-write memory. Thus, from the stand-point of a higher-level language compiler, Miden VM can be viewed as a regular 32-bit stack machine with linear read-write memory.

In order to achieve the second and third goals, Miden assembly facilitates flow control via high-level constructs like while loops, if-else statements, and function calls with statically defined targets. Thus, for example, there are no explicit jump instructions.

In order to achieve the fourth goal, Miden assembly retains direct access to the VM stack rather than abstracting it away with higher-level constructs and named variables.

Lastly, in order to achieve the fifth goal, each instruction of Miden assembly can be encoded using a single byte. The resulting byte-code is simply a one-to-one mapping of instructions to their binary values.

Code organization

A Miden assembly program is just a sequence of instructions each describing a specific directive or an operation. You can use any combination of whitespace characters to separate one instruction from another.

In turn, Miden assembly instructions are just keywords which can be parameterized by zero or more parameters. The notation for specifying parameters is keyword.param1.param2 - i.e., the parameters are separated by periods. For example, push.123 instruction denotes a push operation which is parameterized by value 123.

Miden assembly programs are organized into procedures. Procedures, in turn, can be grouped into modules.

Procedures

A procedure can be used to encapsulate a frequently-used sequence of instructions which can later be invoked via a label. A procedure must start with a proc.<label>.<number of locals> instruction and terminate with an end instruction. For example:

proc.foo.2
    <instructions>
end

A procedure label must start with a letter and can contain any combination of numbers, ASCII letters, and underscores (_). The number of characters in the procedure label cannot exceed 100.

The number of locals specifies the number of memory-based local words a procedure can access (via loc_load, loc_store, and other instructions). If a procedure doesn't need any memory-based locals, this parameter can be omitted or set to 0. A procedure can have at most locals, and the total number of locals available to all procedures at runtime is limited to .

To execute a procedure, the exec.<label>, call.<label>, and syscall.<label> instructions can be used. For example:

exec.foo

The difference between using each of these instructions is explained in the next section.

A procedure may execute any other previously defined procedure, but it cannot execute itself or any of the subsequent procedures. Thus, recursive procedure calls are not possible. For example, the following code block defines a program with two procedures:

proc.foo
    <instructions>
end

proc.bar
    <instructions>
    exec.foo
    <instructions>
end

begin
    <instructions>
    exec.bar
    <instructions>
    exec.foo
end

Dynamic procedure invocation

It is also possible to invoke procedures dynamically - i.e., without specifying target procedure labels at compile time. There are two instructions, dynexec and dyncall, which can be used to execute dynamically-specified code targets. Both instructions expect MAST root of the target to be provided via the stack. The difference between dynexec and dyncall is that dyncall will change context before executing the dynamic code target, while dynexec will cause the code target to be executed in the current context.

Dynamic code execution in the same context is achieved by setting the top elements of the stack to the hash of the dynamic code block and then executing the following instruction:

dynexec

This causes the VM to do the following:

  1. Read the top 4 elements of the stack to get the hash of the dynamic target (leaving the stack unchanged).
  2. Execute the code block which hashes to the specified target. The VM must know the specified code block and hash: they must be in the CodeBlockTable of the executing Program. Hashes can be put into the CodeBlockTable manually, or by executing call, syscall, or procref instructions.

Dynamic code execution in a new context can be achieved similarly by setting the top elements of the stack to the hash of the dynamic code block and then executing the following instruction:

dyncall

Note: In both cases, the stack is left unchanged. Therefore, if the dynamic code is intended to manipulate the stack, it should start by either dropping or moving the code block hash from the top of the stack.

Modules

A module consists of one or more procedures. There are two types of modules: library modules and executable modules (also called programs).

Library modules

Library modules contain zero or more internal procedures and one or more exported procedures. For example, the following module defines one internal procedure (defined with proc instruction) and one exported procedure (defined with export instruction):

proc.foo
    <instructions>
end

export.bar
    <instructions>
    exec.foo
    <instructions>
end

Programs

Executable modules are used to define programs. A program contains zero or more internal procedures (defined with proc instruction) and exactly one main procedure (defined with begin instruction). For example, the following module defines one internal procedure and a main procedure:

proc.foo
    <instructions>
end

begin
    <instructions>
    exec.foo
    <instructions>
end

A program cannot contain any exported procedures.

When a program is executed, the execution starts at the first instruction following the begin instruction. The main procedure is expected to be the last procedure in the program and can be followed only by comments.

Importing modules

To invoke a procedure from an external module, the module first needs to be imported using a use instruction. Once a module is imported, procedures from this module can be invoked via the regular exec or call instructions as exec|call.<module>::<label> where label is the name of the procedure. For example:

use.std::math::u64

begin
    push.1.0
    push.2.0
    exec.u64::wrapping_add
end

In the above example we import std::math::u64 module from the standard library. We then execute a program which pushes two 64-bit integers onto the stack, and then invokes a 64-bit addition procedure from the imported module.

We can also define aliases for imported modules. For example:

use.std::math::u64->bigint

begin
    push.1.0
    push.2.0
    exec.bigint::checked_add
end

The set of modules which can be imported by a program can be specified via a Module Provider when instantiating the Miden Assembler used to compile the program.

Re-exporting procedures

A procedure defined in one module can be re-exported from a different module under the same or a different name. For example:

use.std::math::u64

export.u64::add
export.u64::mul->mul64

export.foo
    <instructions>
end

In addition to the locally-defined procedure foo, the above module also exports procedures add and mul64 implementations of which will be identical to add and mul procedures from the std::math::u64 module respectively.

Constants

Miden assembly supports constant declarations. These constants are scoped to the module they are defined in and can be used as immediate parameters for Miden assembly instructions. Constants are supported as immediate values for the following instructions: push, assert, assertz, asert_eq, assert_eqw, locaddr, loc_load, loc_loadw, loc_store, loc_storew, mem_load, mem_loadw, mem_store, mem_storew.

Constants must be declared right after module imports and before any procedures or program bodies. A constant's name must start with an upper-case letter and can contain any combination of numbers, upper-case ASCII letters, and underscores (_). The number of characters in a constant name cannot exceed 100.

A constant's value must be in a decimal or hexidecimal form and be in the range between and (both inclusive). Value can be defined by an arithmetic expression using +, -, *, /, //, (, ) operators and references to the previously defined constants if it uses only decimal numbers. Here / is a field division and // is an integer division. Note that the arithmetic expression cannot contain spaces.

use.std::math::u64

const.CONSTANT_1=100
const.CONSTANT_2=200+(CONSTANT_1-50)
const.ADDR_1=3

begin
    push.CONSTANT_1.CONSTANT_2
    exec.u64::wrapping_add
    mem_store.ADDR_1
end

Comments

Miden assembly allows annotating code with simple comments. There are two types of comments: single-line comments which start with a # (pound) character, and documentation comments which start with #! characters. For example:

#! This is a documentation comment
export.foo
    # this is a comment
    push.1
end

Documentation comments must precede a procedure declaration. Using them inside a procedure body is an error.

Execution contexts

Miden assembly program execution can span multiple isolated contexts. An execution context defines its own memory space which is not accessible from other execution contexts.

All programs start executing in a root context. Thus, the main procedure of a program is always executed in the root context. To move execution into a different context, we can invoke a procedure using the call instruction. In fact, any time we invoke a procedure using the call instruction, the procedure is executed in a new context. We refer to all non-root contexts as user contexts.

While executing in a user context, we can request to execute some procedures in the root context. This can be done via the syscall instruction. The set of procedures which can be invoked via the syscall instruction is limited by the kernel against which a program is compiled. Once the procedure called via syscall returns, the execution moves back to the user context from which it was invoked. The diagram below illustrates this graphically:

context transitions

Procedure invocation semantics

As mentioned in the previous section, procedures in Miden assembly can be invoked via five different instructions: exec, call, syscall, dynexec, and dyncall. Invocation semantics of call, dyncall, and syscall instructions are basically the same, the only difference being that the syscall instruction can be used only to call kernel's procedures. The exec and dynexec instructions are different, and we explain these differences below.

Invoking via call, dyncall, and syscall instructions

When a procedure is invoked via a call, dyncall or a syscall instruction, the following happens:

  • Execution moves into a different context. In case of the call and dyncall instructions, a new user context is created. In case of a syscall instruction, the execution moves back into the root context.
  • All stack items beyond the 16th item get "hidden" from the invoked procedure. That is, from the standpoint of the invoked procedure, the initial stack depth is set to 16.

When the called procedure returns, the following happens:

  • Execution moves back to the context from which the procedure was invoked.
  • Stack depth is set to its original depth. Before the stack depth is reset, the VM checks if the current stack depth is exactly 16, and fails otherwise.

The manipulations of the stack depth described above have the following implications:

  • The top 16 elements of the stack can be used to pass parameters and return values between the caller and the callee.
  • Caller's stack beyond the top 16 elements is inaccessible to the callee, and thus, is guaranteed not to change as the result of the call.
  • At the end of its execution, the callee must ensure that stack depth is exactly 16. If this is difficult to ensure manually, the truncate_stack procedure can be used to drop all elements from the stack except for the top 16.

Invoking via exec instruction

Procedures invoked via the exec instruction, are inlined at their call sites during compilation. Thus, from the standpoint of the final program, executing procedures this way is indistinguishable from manually including procedure code in place of the exec instruction. This also means that procedures invoked via the exec instruction are executed in the same context as the caller.

Kernels

A kernel defines a set of procedures which can be invoked from user contexts to be executed in the root context. Miden assembly programs are always compiled against some kernel. The default kernel is empty - i.e., it does not contain any procedures. To compile a program against a non-empty kernel, the kernel needs to be specified when instantiating the Miden Assembler.

A kernel can be defined similarly to a regular library module - i.e., it can have internal and exported procedures. However, there are some small differences between what procedures can do in a kernel module vs. what they can do in a regular library module. Specifically:

  • Procedures in a kernel module cannot use call or syscall instructions. This means that creating a new context from within a syscall is not possible.
  • Unlike procedures in regular library modules, procedures in a kernel module can use the caller instruction. This instruction puts the hash of the procedure which initiated the parent context onto the stack.

Memory layout

As mentioned earlier, procedures executed within a given context can access memory only of that context. This is true for both memory reads and memory writes.

Address space of every context is the same: the smallest accessible address is and the largest accessible address is . Any code executed in a given context has access to its entire address space. However, by convention, we assign different meanings to different regions of the address space.

For user contexts we have the following:

  • The first words (each word is 4 field elements) are assumed to be global memory.
  • The next words are reserved for memory locals of procedures executed in the same context (i.e., via the exec instruction).
  • The remaining address space has no special meaning.

user memory layout

For the root context we have the following:

  • The first words are assumed to be global memory.
  • The next words are reserved for memory locals of procedures executed in the root context.
  • The next words are reserved for memory locals of procedures executed from within a syscall.
  • The remaining address space has no special meaning.

root memory layout

For both types of contexts, writing directly into regions of memory reserved for procedure locals is not advisable. Instead, loc_load, loc_store and other similar dedicated instructions should be used to access procedure locals.

Example

To better illustrate what happens as we execute procedures in different contexts, let's go over the following example.

kernel
--------------------
export.baz.2
    <instructions>
    caller
    <instructions>
end

program
--------------------
proc.bar.1
    <instructions>
    syscall.baz
    <instructions>
end

proc.foo.3
    <instructions>
    call.bar
    <instructions>
    exec.bar
    <instructions>
end

begin
    <instructions>
    call.foo
    <instructions>
end

Execution of the above program proceeds as follows:

  1. The VM starts executing instructions immediately following the begin statement. These instructions are executed in the root context (let's call this context ctx0).
  2. When call.foo is executed, a new context is created (ctx1). Memory in this context is isolated from ctx0. Additionally, any elements on the stack beyond the top 16 are hidden from foo.
  3. Instructions executed inside foo can access memory of ctx1 only. The address of the first procedure local in foo (e.g., accessed via loc_load.0) is .
  4. When call.bar is executed, a new context is created (ctx2). The stack depth is set to 16 again, and any instruction executed in this context can access memory of ctx2 only. The first procedure local of bar is also located at address .
  5. When syscall.baz is executed, the execution moves back into the root context. That is, instructions executed inside baz have access to the memory of ctx0. The first procedure local of baz is located at address . When baz starts executing, the stack depth is again set to 16.
  6. When caller is executed inside baz, the first 4 elements of the stack are populated with the hash of bar since baz was invoked from bar's context.
  7. Once baz returns, execution moves back to ctx2, and then, when bar returns, execution moves back to ctx1. We assume that instructions executed right before each procedure returns ensure that the stack depth is exactly 16 right before procedure's end.
  8. Next, when exec.bar is executed, bar is executed again, but this time it is executed in the same context as foo. Thus, it can access memory of ctx1. Moreover, the stack depth is not changed, and thus, bar can access the entire stack of foo. Lastly, this first procedure local of bar now will be at address (since the first 3 locals in this context are reserved for foo).
  9. When syscall.baz is executed the second time, execution moves into the root context again. However, now, when caller is executed inside baz, the first 4 elements of the stack are populated with the hash of foo (not bar). This happens because this time around bar does not have its own context and baz is invoked from foo's context.
  10. Finally, when baz returns, execution moves back to ctx1, and then as bar and foo return, back to ctx0, and the program terminates.

Flow control

As mentioned above, Miden assembly provides high-level constructs to facilitate flow control. These constructs are:

  • if-else expressions for conditional execution.
  • repeat expressions for bounded counter-controlled loops.
  • while expressions for unbounded condition-controlled loops.

Conditional execution

Conditional execution in Miden VM can be accomplished with if-else statements. These statements look like so:

if.true
    <instructions>
else
    <instructions>
end

where instructions can be a sequence of any instructions, including nested control structures; the else clause is optional. The above does the following:

  1. Pops the top item from the stack.
  2. If the value of the item is , instructions in the if.true branch are executed.
  3. If the value of the item is , instructions in the else branch are executed.
  4. If the value is not binary, the execution fails.

A note on performance: using if-else statements incurs a small, but non-negligible overhead. Thus, for simple conditional statements, it may be more efficient to compute the result of both branches, and then select the result using conditional drop instructions.

Counter-controlled loops

Executing a sequence of instructions a predefined number of times can be accomplished with repeat statements. These statements look like so:

repeat.<count>
    <instructions>
end

where:

  • instructions can be a sequence of any instructions, including nested control structures.
  • count is the number of times the instructions sequence should be repeated (e.g. repeat.10). count must be an integer or a constant greater than .

Note: During compilation the repeat.<count> blocks are unrolled and expanded into <count> copies of its inner block, there is no additional cost for counting variables in this case.

Condition-controlled loops

Executing a sequence of instructions zero or more times based on some condition can be accomplished with while loop expressions. These expressions look like so:

while.true
    <instructions>
end

where instructions can be a sequence of any instructions, including nested control structures. The above does the following:

  1. Pops the top item from the stack.
  2. If the value of the item is , instructions in the loop body are executed. a. After the body is executed, the stack is popped again, and if the popped value is , the body is executed again. b. If the popped value is , the loop is exited. c. If the popped value is not binary, the execution fails.
  3. If the value of the item is , execution of loop body is skipped.
  4. If the value is not binary, the execution fails.

Example:

# push the boolean true to the stack
push.1

# pop the top element of the stack and loop while it is true
while.true
    # push the boolean false to the stack, finishing the loop for the next iteration
    push.0
end

Field operations

Miden assembly provides a set of instructions which can perform operations with raw field elements. These instructions are described in the tables below.

While most operations place no restrictions on inputs, some operations expect inputs to be binary values, and fail if executed with non-binary inputs.

For instructions where one or more operands can be provided as immediate parameters (e.g., add and add.b), we provide stack transition diagrams only for the non-immediate version. For the immediate version, it can be assumed that the operand with the specified name is not present on the stack.

Assertions and tests

InstructionStack_inputStack_outputNotes
assert
- (1 cycle)
[a, ...][...]If , removes it from the stack.
Fails if
assertz
- (2 cycles)
[a, ...][...]If , removes it from the stack,
Fails if
assert_eq
- (2 cycles)
[b, a, ...][...]If , removes them from the stack.
Fails if
assert_eqw
- (11 cycles)
[B, A, ...][...]If , removes them from the stack.
Fails if

The above instructions can also be parametrized with an error code which can be any 32-bit value specified either directly or via a named constant. For example:

assert.err=123
assert.err=MY_CONSTANT

If the error code is omitted, the default value of is assumed.

Arithmetic and Boolean operations

The arithmetic operations below are performed in a 64-bit prime filed defined by modulus . This means that overflow happens after a value exceeds . Also, the result of divisions may appear counter-intuitive because divisions are defined via inversions.

InstructionStack_inputStack_outputNotes
add
- (1 cycle)
add.b
- (1-2 cycle)
[b, a, ...][c, ...]
sub
- (2 cycles)
sub.b
- (2 cycles)
[b, a, ...][c, ...]
mul
- (1 cycle)
mul.b
- (2 cycles)
[b, a, ...][c, ...]
div
- (2 cycles)
div.b
- (2 cycles)
[b, a, ...][c, ...]
Fails if
neg
- (1 cycle)
[a, ...][b, ...]
inv
- (1 cycle)
[a, ...][b, ...]
Fails if
pow2
- (16 cycles)
[a, ...][b, ...]
Fails if
exp.uxx
- (9 + xx cycles)
exp.b
- (9 + log2(b) cycles)
[b, a, ...][c, ...]
Fails if xx is outside [0, 63)
exp is equivalent to exp.u64 and needs 73 cycles
ilog2
- (44 cycles)
[a, ...][b, ...]
Fails if
not
- (1 cycle)
[a, ...][b, ...]
Fails if
and
- (1 cycle)
[b, a, ...][c, ...]
Fails if
or
- (1 cycle)
[b, a, ...][c, ...]
Fails if
xor
- (7 cycles)
[b, a, ...][c, ...]
Fails if

Comparison operations

InstructionStack_inputStack_outputNotes
eq
- (1 cycle)
eq.b
- (1-2 cycles)
[b, a, ...][c, ...]
neq
- (2 cycle)
neq.b
- (2-3 cycles)
[b, a, ...][c, ...]
lt
- (14 cycles)
[b, a, ...][c, ...]
lte
- (15 cycles)
[b, a, ...][c, ...]
gt
- (15 cycles)
[b, a, ...][c, ...]
gte
- (16 cycles)
[b, a, ...][c, ...]
is_odd
- (5 cycles)
[a, ...][b, ...]
eqw
- (15 cycles)
[A, B, ...][c, A, B, ...]

Extension Field Operations

InstructionStack InputStack OutputNotes
ext2add
- (5 cycles)
[b1, b0, a1, a0, ...][c1, c0, ...] and
ext2sub
- (7 cycles)
[b1, b0, a1, a0, ...][c1, c0, ...] and
ext2mul
- (3 cycles)
[b1, b0, a1, a0, ...][c1, c0, ...] and
ext2neg
- (4 cycles)
[a1, a0, ...][a1', a0', ...] and
ext2inv
- (8 cycles)
[a1, a0, ...][a1', a0', ...]
Fails if
ext2div
- (11 cycles)
[b1, b0, a1, a0, ...][c1, c0,] fails if , where multiplication and inversion are as defined by the operations above

u32 operations

Miden assembly provides a set of instructions which can perform operations on regular two-complement 32-bit integers. These instructions are described in the tables below.

For instructions where one or more operands can be provided as immediate parameters (e.g., u32wrapping_add and u32wrapping_add.b), we provide stack transition diagrams only for the non-immediate version. For the immediate version, it can be assumed that the operand with the specified name is not present on the stack.

In all the table below, the number of cycles it takes for the VM to execute each instruction is listed beneath the instruction.

Conversions and tests

InstructionStack inputStack outputNotes
u32test
- (5 cycles)
[a, ...][b, a, ...]
u32testw
- (23 cycles)
[A, ...][b, A, ...]
u32assert
- (3 cycles)
[a, ...][a, ...]Fails if
u32assert2
- (1 cycle)
[b, a,...][b, a,...]Fails if or
u32assertw
- (6 cycles)
[A, ...][A, ...]Fails if
u32cast
- (2 cycles)
[a, ...][b, ...]
u32split
- (1 cycle)
[a, ...][c, b, ...],

The instructions u32assert, u32assert2 and u32assertw can also be parametrized with an error code which can be any 32-bit value specified either directly or via a named constant. For example:

u32assert.err=123
u32assert.err=MY_CONSTANT

If the error code is omitted, the default value of is assumed.

Arithmetic operations

InstructionStack inputStack outputNotes
u32overflowing_add
- (1 cycle)
u32overflowing_add.b
- (2-3 cycles)
[b, a, ...][d, c, ...]

Undefined if
u32wrapping_add
- (2 cycles)
u32wrapping_add.b
- (3-4 cycles)
[b, a, ...][c, ...]
Undefined if
u32overflowing_add3
- (1 cycle)
[c, b, a, ...][e, d, ...],

Undefined if
u32wrapping_add3
- (2 cycles)
[c, b, a, ...][d, ...],
Undefined if
u32overflowing_sub
- (1 cycle)
u32overflowing_sub.b
- (2-3 cycles)
[b, a, ...][d, c, ...]

Undefined if
u32wrapping_sub
- (2 cycles)
u32wrapping_sub.b
- (3-4 cycles)
[b, a, ...][c, ...]
Undefined if
u32overflowing_mul
- (1 cycle)
u32overflowing_mul.b
- (2-3 cycles)
[b, a, ...][d, c, ...]

Undefined if
u32wrapping_mul
- (2 cycles)
u32wrapping_mul.b
- (3-4 cycles)
[b, a, ...][c, ...]
Undefined if
u32overflowing_madd
- (1 cycle)
[b, a, c, ...][e, d, ...]

Undefined if
u32wrapping_madd
- (2 cycles)
[b, a, c, ...][d, ...]
Undefined if
u32div
- (2 cycles)
u32div.b
- (3-4 cycles)
[b, a, ...][c, ...]
Fails if
Undefined if
u32mod
- (3 cycles)
u32mod.b
- (4-5 cycles)
[b, a, ...][c, ...]
Fails if
Undefined if
u32divmod
- (1 cycle)
u32divmod.b
- (2-3 cycles)
[b, a, ...][d, c, ...]

Fails if
Undefined if

Bitwise operations

InstructionStack inputStack outputNotes
u32and
- (1 cycle)
[b, a, ...][c, ...]Computes as a bitwise AND of binary representations of and .
Fails if
u32or
- (6 cycle)s
[b, a, ...][c, ...]Computes as a bitwise OR of binary representations of and .
Fails if
u32xor
- (1 cycle)
[b, a, ...][c, ...]Computes as a bitwise XOR of binary representations of and .
Fails if
u32not
- (5 cycles)
[a, ...][b, ...]Computes as a bitwise NOT of binary representation of .
Fails if
u32shl
- (18 cycles)
u32shl.b
- (3 cycles)
[b, a, ...][c, ...]
Undefined if or
u32shr
- (18 cycles)
u32shr.b
- (3 cycles)
[b, a, ...][c, ...]
Undefined if or
u32rotl
- (18 cycles)
u32rotl.b
- (3 cycles)
[b, a, ...][c, ...]Computes by rotating a 32-bit representation of to the left by bits.
Undefined if or
u32rotr
- (22 cycles)
u32rotr.b
- (3 cycles)
[b, a, ...][c, ...]Computes by rotating a 32-bit representation of to the right by bits.
Undefined if or
u32popcnt
- (33 cycles)
[a, ...][b, ...]Computes by counting the number of set bits in (hamming weight of ).
Undefined if
u32clz
- (37 cycles)
[a, ...][b, ...]Computes as a number of leading zeros of .
Undefined if
u32ctz
- (34 cycles)
[a, ...][b, ...]Computes as a number of trailing zeros of .
Undefined if
u32clo
- (36 cycles)
[a, ...][b, ...]Computes as a number of leading ones of .
Undefined if
u32cto
- (33 cycles)
[a, ...][b, ...]Computes as a number of trailing ones of .
Undefined if

Comparison operations

InstructionStack inputStack outputNotes
u32lt
- (3 cycles)
[b, a, ...][c, ...]
Undefined if
u32lte
- (5 cycles)
[b, a, ...][c, ...]
Undefined if
u32gt
- (4 cycles)
[b, a, ...][c, ...]
Undefined if
u32gte
- (4 cycles)
[b, a, ...][c, ...]
Undefined if
u32min
- (8 cycles)
[b, a, ...][c, ...]
Undefined if
u32max
- (9 cycles)
[b, a, ...][c, ...]
Undefined if

Stack manipulation

Miden VM stack is a push-down stack of field elements. The stack has a maximum depth of , but only the top elements are directly accessible via the instructions listed below.

In addition to the typical stack manipulation instructions such as drop, dup, swap etc., Miden assembly provides several conditional instructions which can be used to manipulate the stack based on some condition - e.g., conditional swap cswap or conditional drop cdrop.

InstructionStack_inputStack_outputNotes
drop
- (1 cycle)
[a, ... ][ ... ]Deletes the top stack item.
dropw
- (4 cycles)
[A, ... ][ ... ]Deletes a word (4 elements) from the top of the stack.
padw
- (4 cycles)
[ ... ][0, 0, 0, 0, ... ]Pushes four values onto the stack.
Note: simple pad is not provided because push.0 does the same thing.
dup.n
- (1-3 cycles)
[ ..., a, ... ][a, ..., a, ... ]Pushes a copy of the th stack item onto the stack. dup and dup.0 are the same instruction. Valid for
dupw.n
- (4 cycles)
[ ..., A, ... ][A, ..., A, ... ]Pushes a copy of the th stack word onto the stack. dupw and dupw.0 are the same instruction. Valid for
swap.n
- (1-6 cycles)
[a, ..., b, ... ][b, ..., a, ... ]Swaps the top stack item with the th stack item. swap and swap.1 are the same instruction. Valid for
swapw.n
- (1 cycle)
[A, ..., B, ... ][B, ..., A, ... ]Swaps the top stack word with the th stack word. swapw and swapw.1 are the same instruction. Valid for
swapdw
- (1 cycle)
[D, C, B, A, ... ][B, A, D, C ... ]Swaps words on the top of the stack. The 1st with the 3rd, and the 2nd with the 4th.
movup.n
- (1-4 cycles)
[ ..., a, ... ][a, ... ]Moves the th stack item to the top of the stack. Valid for
movupw.n
- (2-3 cycles)
[ ..., A, ... ][A, ... ]Moves the th stack word to the top of the stack. Valid for
movdn.n
- (1-4 cycles)
[a, ... ][ ..., a, ... ]Moves the top stack item to the th position of the stack. Valid for
movdnw.n
- (2-3 cycles)
[A, ... ][ ..., A, ... ]Moves the top stack word to the th word position of the stack. Valid for

Conditional manipulation

InstructionStack_inputStack_outputNotes
cswap
- (1 cycle)
[c, b, a, ... ][e, d, ... ]

Fails if
cswapw
- (1 cycle)
[c, B, A, ... ][E, D, ... ]

Fails if
cdrop
- (2 cycles)
[c, b, a, ... ][d, ... ]
Fails if
cdropw
- (5 cycles)
[c, B, A, ... ][D, ... ]
Fails if

Input / output operations

Miden assembly provides a set of instructions for moving data between the operand stack and several other sources. These sources include:

  • Program code: values to be moved onto the operand stack can be hard-coded in a program's source code.
  • Environment: values can be moved onto the operand stack from environment variables. These include current clock cycle, current stack depth, and a few others.
  • Advice provider: values can be moved onto the operand stack from the advice provider by popping them from the advice stack (see more about the advice provider here). The VM can also inject new data into the advice provider via advice injector instructions.
  • Memory: values can be moved between the stack and random-access memory. The memory is word-addressable, meaning, four elements are located at each address, and we can read and write elements to/from memory in batches of four. Memory can be accessed via absolute memory references (i.e., via memory addresses) as well as via local procedure references (i.e., local index). The latter approach ensures that a procedure does not access locals of another procedure.

Constant inputs

InstructionStack_inputStack_outputNotes
push.a
- (1-2 cycles)
push.a.b
push.a.b.c...
[ ... ][a, ... ]
[b, a, ... ]
[c, b, a, ... ]
Pushes values , , etc. onto the stack. Up to values can be specified. All values must be valid field elements in decimal (e.g., ) or hexadecimal (e.g., ) representation.

The value can be specified in hexadecimal form without periods between individual values as long as it describes a full word ( field elements or bytes). Note that hexadecimal values separated by periods (short hexadecimal strings) are assumed to be in big-endian order, while the strings specifying whole words (long hexadecimal strings) are assumed to be in little-endian order. That is, the following are semantically equivalent:

push.0x00001234.0x00005678.0x00009012.0x0000abcd
push.0x341200000000000078560000000000001290000000000000cdab000000000000
push.4660.22136.36882.43981

In both case the values must still encode valid field elements.

Environment inputs

InstructionStack_inputStack_outputNotes
clk
- (1 cycle)
[ ... ][t, ... ]
Pushes the current value of the clock cycle counter onto the stack.
sdepth
- (1 cycle)
[ ... ][d, ... ]
Pushes the current depth of the stack onto the stack.
caller
- (1 cycle)
[A, b, ... ][H, b, ... ]
Overwrites the top four stack items with the hash of a function which initiated the current SYSCALL.
Executing this instruction outside of SYSCALL context will fail.
locaddr.i
- (2 cycles)
[ ... ][a, ... ]
Pushes the absolute memory address of local memory at index onto the stack.
procref.name
- (4 cycles)
[ ... ][A, ... ]
Pushes MAST root of the procedure with name onto the stack.

Nondeterministic inputs

As mentioned above, nondeterministic inputs are provided to the VM via the advice provider. Instructs which access the advice provider fall into two categories. The first category consists of instructions which move data from the advice stack onto the operand stack and/or memory.

InstructionStack_inputStack_outputNotes
adv_push.n
- (n cycles)
[ ... ][a, ... ]
Pops values from the advice stack and pushes them onto the operand stack. Valid for .
Fails if the advice stack has fewer than values.
adv_loadw
- (1 cycle)
[0, 0, 0, 0, ... ][A, ... ]
Pop the next word (4 elements) from the advice stack and overwrites the first word of the operand stack (4 elements) with them.
Fails if the advice stack has fewer than values.
adv_pipe
- (1 cycle)
[C, B, A, a, ... ][E, D, A, a', ... ]

Pops the next two words from the advice stack, overwrites the top of the operand stack with them and also writes these words into memory at address and .
Fails if the advice stack has fewer than values.

Note: The opcodes above always push data onto the operand stack so that the first element is placed deepest in the stack. For example, if the data on the stack is a,b,c,d and you use the opcode adv_push.4, the data will be d,c,b,a on your stack. This is also the behavior of the other opcodes.

The second category injects new data into the advice provider. These operations are called advice injectors and they affect only the advice provider state. That is, the state of all other VM components (e.g., stack, memory) are unaffected. Executing advice injectors does not consume any VM cycles (i.e., these instructions are executed in cycles).

Advice injectors fall into two categories: (1) injectors which push new data onto the advice stack, and (2) injectors which insert new data into the advice map.

InstructionStack_inputStack_outputNotes
adv.push_mapval
adv.push_mapval.s
[K, ... ][K, ... ]Pushes a list of field elements onto the advice stack. The list is looked up in the advice map using word as the key. If offset is provided, the key is taken starting from item on the stack.
adv.push_mapvaln
adv.push_mapvaln.s
[K, ... ][K, ... ]Pushes a list of field elements together with the number of elements onto the advice stack. The list is looked up in the advice map using word as the key. If offset is provided, the key is taken starting from item on the stack.
adv.push_mtnode[d, i, R, ... ][d, i, R, ... ]Pushes a node of a Merkle tree with root at depth and index from Merkle store onto the advice stack.
adv.push_u64div[b1, b0, a1, a0, ...][b1, b0, a1, a0, ...]Pushes the result of u64 division onto the advice stack. Both and are represented using 32-bit limbs. The result consists of both the quotient and the remainder.
adv.push_ext2intt[osize, isize, iptr, ... ][osize, isize, iptr, ... ]Given evaluations of a polynomial over some specified domain, interpolates the evaluations into a polynomial in coefficient form and pushes the result into the advice stack.
adv.push_sig.kind[K, M, ...][K, M, ...]Pushes values onto the advice stack which are required for verification of a DSA with scheme specified by kind against the public key commitment and message .
adv.push_smtpeek[K, R, ... ][K, R, ... ]Pushes value onto the advice stack which is associated with key in a Sparse Merkle Tree with root .
adv.insert_mem[K, a, b, ... ][K, a, b, ... ]Reads words from memory, and save the data into .
adv.insert_hdword
adv.insert_hdword.d
[B, A, ... ][B, A, ... ]Reads top two words from the stack, computes a key as , and saves the data into . is an optional domain value which can be between and , default value .
adv.insert_hperm[B, A, C, ...][B, A, C, ...]Reads top three words from the stack, computes a key as , and saves data into .

Random access memory

As mentioned above, there are two ways to access memory in Miden VM. The first way is via memory addresses using the instructions listed below. The addresses are absolute - i.e., they don't depend on the procedure context. Memory addresses can be in the range .

Memory is guaranteed to be initialized to zeros. Thus, when reading from memory address which hasn't been written to previously, zero elements will be returned.

InstructionStack_inputStack_outputNotes
mem_load
- (1 cycle)
mem_load.a
- (2 cycles)
[a, ... ][v, ... ]
Reads a word (4 elements) from memory at address a, and pushes the first element of the word onto the stack. If is provided via the stack, it is removed from the stack first.
Fails if
mem_loadw
- (1 cycle)
mem_loadw.a
- (2 cycles)
[a, 0, 0, 0, 0, ... ][A, ... ]
Reads a word from memory at address and overwrites top four stack elements with it. If is provided via the stack, it is removed from the stack first.
Fails if
mem_store
- (2 cycles)
mem_store.a
- (3-4 cycles)
[a, v, ... ][ ... ]
Pops the top element off the stack and stores it as the first element of the word in memory at address . All other elements of the word are not affected. If is provided via the stack, it is removed from the stack first.
Fails if
mem_storew
- (1 cycle)
mem_storew.a
- (2-3 cycles)
[a, A, ... ][A, ... ]
Stores the top four elements of the stack in memory at address . If is provided via the stack, it is removed from the stack first.
Fails if
mem_stream
- (1 cycle)
[C, B, A, a, ... ][E, D, A, a', ... ]

Read two sequential words from memory starting at address and overwrites the first two words in the operand stack.

The second way to access memory is via procedure locals using the instructions listed below. These instructions are available only in procedure context. The number of locals available to a given procedure must be specified at procedure declaration time, and trying to access more locals than was declared will result in a compile-time error. The number of locals per procedure is not limited, but the total number of locals available to all procedures at runtime must be smaller than .

InstructionStack_inputStack_outputNotes
loc_load.i
- (3-4 cycles)
[ ... ][v, ... ]
Reads a word (4 elements) from local memory at index i, and pushes the first element of the word onto the stack.
loc_loadw.i
- (3-4 cycles)
[0, 0, 0, 0, ... ][A, ... ]
Reads a word from local memory at index and overwrites top four stack elements with it.
loc_store.i
- (4-5 cycles)
[v, ... ][ ... ]
Pops the top element off the stack and stores it as the first element of the word in local memory at index . All other elements of the word are not affected.
loc_storew.i
- (3-4 cycles)
[A, ... ][A, ... ]
Stores the top four elements of the stack in local memory at index .

Unlike regular memory, procedure locals are not guaranteed to be initialized to zeros. Thus, when working with locals, one must assume that before a local memory address has been written to, it contains "garbage".

Internally in the VM, procedure locals are stored at memory offset stating at . Thus, every procedure local has an absolute address in regular memory. The locaddr.i instruction is provided specifically to map an index of a procedure's local to an absolute address so that it can be passed to downstream procedures, when needed.

Cryptographic operations

Miden assembly provides a set of instructions for performing common cryptographic operations. These instructions are listed in the table below.

Hashing and Merkle trees

Rescue Prime Optimized is the native hash function of Miden VM. The parameters of the hash function were chosen to provide 128-bit security level against preimage and collision attacks. The function operates over a state of 12 field elements, and requires 7 rounds for a single permutation. However, due to its special status within the VM, computing Rescue Prime Optimized hashes can be done very efficiently. For example, applying a permutation of the hash function can be done in a single VM cycle.

InstructionStack_inputStack_outputNotes
hash
- (20 cycles)
[A, ...][B, ...]
where, computes a 1-to-1 Rescue Prime Optimized hash.
hperm
- (1 cycle)
[C, B, A, ...][F, E, D, ...]
Performs a Rescue Prime Optimized permutation on the top 3 words of the operand stack, where the top 2 words elements are the rate (words C and B), the deepest word is the capacity (word A), the digest output is the word E.
hmerge
- (16 cycles)
[B, A, ...][C, ...]
where, computes a 2-to-1 Rescue Prime Optimized hash.
mtree_get
- (9 cycles)
[d, i, R, ...][V, R, ...]Fetches the node value from the advice provider and runs a verification equivalent to mtree_verify, returning the value if succeeded.
mtree_set
- (29 cycles)
[d, i, R, V', ...][V, R', ...]Updates a node in the Merkle tree with root at depth and index to value . is the Merkle root of the resulting tree and is old value of the node. Merkle tree with root must be present in the advice provider, otherwise execution fails. At the end of the operation the advice provider will contain both Merkle trees.
mtree_merge
- (16 cycles)
[R, L, ...][M, ...]Merges two Merkle trees with the provided roots R (right), L (left) into a new Merkle tree with root M (merged). The input trees are retained in the advice provider.
mtree_verify
- (1 cycle)
[V, d, i, R, ...][V, d, i, R, ...]Verifies that a Merkle tree with root opens to node at depth and index . Merkle tree with root must be present in the advice provider, otherwise execution fails.

Events

Miden assembly supports the concept of events. Events are a simple data structure with a single event_id field. When an event is emitted by a program, it is communicated to the host. Events can be emitted at specific points of program execution with the intent of triggering some action on the host. This is useful as the program has contextual information that would be challenging for the host to infer. The emission of events allows the program to communicate this contextual information to the host. The host contains an event handler that is responsible for handling events and taking appropriate actions. The emission of events does not change the state of the VM but it can change the state of the host.

An event can be emitted via the emit.<event_id> assembly instruction where <event_id> can be any 32-bit value specified either directly or via a named constant. For example:

emit.EVENT_ID_1
emit.2

Tracing

Miden assembly also supports code tracing, which works similar to the event emitting.

A trace can be emitted via the trace.<trace_id> assembly instruction where <trace_id> can be any 32-bit value specified either directly or via a named constant. For example:

trace.EVENT_ID_1
trace.2

To make use of the trace instruction, programs should be ran with tracing flag (-t or --tracing), otherwise these instructions will be ignored.

Debugging

To support basic debugging capabilities, Miden assembly provides a debug instruction. This instruction prints out the state of the VM at the time when the debug instruction is executed. The instruction can be parameterized as follows:

  • debug.stack prints out the entire contents of the stack.
  • debug.stack.<n> prints out the top items of the stack. must be an integer greater than and smaller than .
  • debug.mem prints out the entire contents of RAM.
  • debug.mem.<n> prints out contents of memory at address .
  • debug.mem.<n>.<m> prints out the contents of memory starting at address and ending at address (both inclusive). must be greater or equal to .
  • debug.local prints out the whole local memory of the currently executing procedure.
  • debug.local.<n> prints out contents of the local memory at index for the currently executing procedure. must be greater or equal to and smaller than .
  • debug.local.<n>.<m> prints out contents of the local memory starting at index and ending at index (both inclusive). must be greater or equal to . and must be greater or equal to and smaller than .

Debug instructions do not affect the VM state and do not change the program hash.

To make use of the debug instruction, programs must be compiled with an assembler instantiated in the debug mode. Otherwise, the assembler will simply ignore the debug instructions.

Miden Standard Library

Miden standard library provides a set of procedures which can be used by any Miden program. These procedures build on the core instruction set of Miden assembly expanding the functionality immediately available to the user.

The goals of Miden standard library are:

  • Provide highly-optimized and battle-tested implementations of commonly-used primitives.
  • Reduce the amount of code that needs to be shared between parties for proving and verifying program execution.

The second goal can be achieved because calls to procedures in the standard library can always be serialized as 32 bytes, regardless of how large the procedure is.

Terms and notations

In this document we use the following terms and notations:

  • A field element is an element in a prime field of size .
  • A binary value means a field element which is either or .
  • Inequality comparisons are assumed to be performed on integer representations of field elements in the range .

Throughout this document, we use lower-case letters to refer to individual field elements (e.g., ). Sometimes it is convenient to describe operations over groups of elements. For these purposes we define a word to be a group of four elements. We use upper-case letters to refer to words (e.g., ). To refer to individual elements within a word, we use numerical subscripts. For example, is the first element of word , is the last element of word , etc.

Organization and usage

Procedures in the Miden Standard Library are organized into modules, each targeting a narrow set of functionality. Modules are grouped into higher-level namespaces. However, higher-level namespaces do not expose any procedures themselves. For example, std::math::u64 is a module containing procedures for working with 64-bit unsigned integers. This module is a part of the std::math namespace. However, the std::math namespace does not expose any procedures.

For an example of how to invoke procedures from imported modules see this section.

Available modules

Currently, Miden standard library contains just a few modules, which are listed below. Over time, we plan to add many more modules which will include various cryptographic primitives, additional numeric data types and operations, and many others.

ModuleDescription
std::collections::mmrContains procedures for manipulating Merkle Mountain Ranges.
std::crypto::fri::frie2f4Contains procedures for verifying FRI proofs (field extension = 2, folding factor = 4).
std::crypto::hashes::blake3Contains procedures for computing hashes using BLAKE3 hash function.
std::crypto::hashes::sha256Contains procedures for computing hashes using SHA256 hash function.
std::math::u64Contains procedures for working with 64-bit unsigned integers.
std::memContains procedures for working with random access memory.
std::sysContains system-level utility procedures.

Collections

Namespace std::collections contains modules for commonly-used authenticated data structures. This includes:

  • A Merkle Mountain range.
  • A Sparse Merkle Tree with 64-bit keys.
  • A Sparse Merkle Tree with 256-bit keys.

Merkle Mountain Range

Module std::collections::mmr contains procedures for manipulating Merkle Mountain Range data structure which can be used as an append-only log.

The following procedures are available to read data from and make updates to a Merkle Mountain Range.

ProcedureDescription
getLoads the leaf at the absolute position pos in the MMR onto the stack.

Valid range for pos is between and (both inclusive).

Inputs: [pos, mmr_ptr, ...]
Output: [N, ...]

Where N is the leaf loaded from the MMR whose memory location starts at mmr_ptr.
addAdds a new leaf to the MMR.

This will update the MMR peaks in the VM's memory and the advice provider with any merged nodes.

Inputs: [N, mmr_ptr, ...]
Outputs: [...]

Where N is the leaf added to the MMR whose memory locations starts at mmr_ptr.
packComputes a commitment to the given MMR and copies the MMR to the Advice Map using the commitment as a key.

Inputs: [mmr_ptr, ...]
Outputs: [HASH, ...]

unpackLoad the MMR peak data based on its hash.

Inputs: [HASH, mmr_ptr, ...]
Outputs: [...]

Where:
- HASH: is the MMR peak hash, the hash is expected to be padded to an even length and to have a minimum size of 16 elements.
- The advice map must contain a key with HASH, and its value is num_leaves || hash_data, and hash_data is the data used to computed HASH
- mmt_ptr: the memory location where the MMR data will be written, starting with the MMR forest (the total count of its leaves) followed by its peaks.

Sparse Merkle Tree

Module std::collections::smt contains procedures for manipulating key-value maps with 4-element keys and 4-element values. The underlying implementation is a Sparse Merkle Tree where leaves can exist only at depth 64. Initially, when a tree is empty, it is equivalent to an empty Sparse Merkle Tree of depth 64 (i.e., leaves at depth 64 are set and hash to [ZERO; 4]). When inserting non-empty values into the tree, the most significant element of the key is used to identify the corresponding leaf. All key-value pairs that map to a given leaf are inserted (ordered) in the leaf.

The following procedures are available to read data from and make updates to a Sparse Merkle Tree.

ProcedureDescription
getReturns the value located under the specified key in the Sparse Merkle Tree defined by the specified root.

If no values had been previously inserted under the specified key, an empty word is returned.

Inputs: [KEY, ROOT, ...]
Outputs: [VALUE, ROOT, ...]

Fails if the tree with the specified root does not exist in the VM's advice provider.
setInserts the specified value under the specified key in a Sparse Merkle Tree defined by the specified root. If the insert is successful, the old value located under the specified key is returned via the stack.

If VALUE is an empty word, the new state of the tree is guaranteed to be equivalent to the state as if the updated value was never inserted.

Inputs: [VALUE, KEY, ROOT, ...]
Outputs: [OLD_VALUE, NEW_ROOT, ...]

Fails if the tree with the specified root does not exits in the VM's advice provider.

Digital signatures

Namespace std::crypto::dsa contains a set of digital signature schemes supported by default in the Miden VM. Currently, these schemes are:

  • RPO Falcon512: a variant of the Falcon signature scheme.

RPO Falcon512

Module std::crypto::dsa::rpo_falcon512 contains procedures for verifying RPO Falcon512 signatures. These signatures differ from the standard Falcon signatures in that instead of using SHAKE256 hash function in the hash-to-point algorithm we use RPO256. This makes the signature more efficient to verify in the Miden VM.

The module exposes the following procedures:

ProcedureDescription
verifyVerifies a signature against a public key and a message. The procedure gets as inputs the hash of the public key and the hash of the message via the operand stack. The signature is expected to be provided via the advice provider.

The signature is valid if and only if the procedure returns.

Inputs: [PK, MSG, ...]
Outputs: [...]

Where PK is the hash of the public key and MSG is the hash of the message. Both hashes are expected to be computed using RPO hash function.

The procedure relies on the adv.push_sig decorator to retrieve the signature from the host. The default host implementation assumes that the private-public key pair is loaded into the advice provider, and uses it to generate the signature. However, for production grade implementations, this functionality should be overridden to ensure more secure handling of private keys.

FRI verification procedures

Namespace std::crypto::fri contains modules for verifying FRI proofs.

FRI Extension 2, Fold 4

Module std::crypto::fri::frie2f4 contains procedures for verifying FRI proofs generated over the quadratic extension of the Miden VM's base field. Moreover, the procedures assume that layer folding during the commit phase of FRI protocol was performed using folding factor 4.

ProcedureDescription
verifyVerifies a FRI proof where the proof was generated over the quadratic extension of the base field and layer folding was performed using folding factor 4.

Input: [query_start_ptr, query_end_ptr, layer_ptr, rem_ptr, g, ...]>
Output: [...]

- query_start_ptr is a pointer to a list of tuples of the form (e0, e1, p, 0) where p is a query index at the first layer and (e0, e1) is an extension field element corresponding to the value of the first layer at index p.
- query_end_ptr is a pointer to the first empty memory address after the last (e0, e1, p, 0) tuple.
- layer_ptr is a pointer to the first layer commitment denoted throughout the code by C. layer_ptr + 1 points to the first (alpha0, alpha1, t_depth, d_size) where d_size is the size of initial domain divided by 4, t_depth is the depth of the Merkle tree commitment to the first layer and (alpha0, alpha1) is the first challenge used in folding the first layer. Both t_depth and d_size are expected to be smaller than 2^32. Otherwise, the result of this procedure is undefined.
- rem_ptr is a pointer to the first tuple of two consecutive degree 2 extension field elements making up the remainder codeword. This codeword can be of length either 32 or 64.

The memory referenced above is used contiguously, as follows:
[layer_ptr ... rem_ptr ... query_start_ptr ... query_end_ptr]

This means for example that:
1. rem_ptr - 1 points to the last (alpha0, alpha1, t_depth, d_size) tuple.
2. The length of the remainder codeword is 2 * (rem_ptr - query_start_ptr).

Cycles: for domains of size 2^n where:
- n is even: 12 + 6 + num_queries * (40 + num_layers * 76 + 69) + 2626
- n is odd: 12 + 6 + num_queries * (40 + num_layers * 76 + 69) + 1356

Cryptographic hashes

Namespace std::crypto contains modules for commonly used cryptographic hash functions.

BLAKE3

Module std::crypto::hashes::blake3 contains procedures for computing hashes using BLAKE3 hash function. The input and output elements are assumed to contain one 32-bit value per element.

ProcedureDescription
hash_1to1Computes BLAKE3 1-to-1 hash.

Input: 32-bytes stored in the first 8 elements of the stack (32 bits per element).

Output: A 32-byte digest stored in the first 8 elements of stack (32 bits per element).
hash_2to1Computes BLAKE3 2-to-1 hash.

Input: 64-bytes stored in the first 16 elements of the stack (32 bits per element).

Output: A 32-byte digest stored in the first 8 elements of stack (32 bits per element)

SHA256

Module std::crypto::hashes::sha256 contains procedures for computing hashes using SHA256 hash function. The input and output elements are assumed to contain one 32-bit value per element.

ProcedureDescription
hash_1to1Computes SHA256 1-to-1 hash.

Input: 32-bytes stored in the first 8 elements of the stack (32 bits per element).

Output: A 32-byte digest stored in the first 8 elements of stack (32 bits per element).
hash_2to1Computes SHA256 2-to-1 hash.

Input: 64-bytes stored in the first 16 elements of the stack (32 bits per element).

Output: A 32-byte digest stored in the first 8 elements of stack (32 bits per element).

Unsigned 64-bit integer operations

Module std::math::u64 contains a set of procedures which can be used to perform unsigned 64-bit integer operations. These operations fall into the following categories:

  • Arithmetic operations - addition, multiplication, division etc.
  • Comparison operations - equality, less than, greater than etc.
  • Bitwise operations - binary AND, OR, XOR, bit shifts etc.

All procedures assume that an unsigned 64-bit integer (u64) is encoded using two elements, each containing an unsigned 32-bit integer (u32). When placed on the stack, the least-significant limb is assumed to be deeper in the stack. For example, a u64 value a consisting of limbs a_hi and a_lo would be position on the stack like so:

[a_hi, a_lo, ... ]

Many of the procedures listed below (e.g., overflowing_add, wrapping_add, lt) do not check whether the inputs are encoded using valid u32 values. These procedures do not fail when the inputs are encoded incorrectly, but rather produce undefined results. Thus, it is important to be certain that limbs of input values are valid u32 values prior to calling such procedures.

Arithmetic operations

ProcedureDescription
overflowing_addPerforms addition of two unsigned 64-bit integers preserving the overflow.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [overflow_flag, c_hi, c_lo, ...], where c = (a + b) % 2^64
This takes 6 cycles.
wrapping_addPerforms addition of two unsigned 64-bit integers discarding the overflow.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = (a + b) % 2^64
This takes 7 cycles.
overflowing_subPerforms subtraction of two unsigned 64-bit integers preserving the overflow.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [underflow_flag, c_hi, c_lo, ...], where c = (a - b) % 2^64
This takes 11 cycles.
wrapping_subPerforms subtraction of two unsigned 64-bit integers discarding the overflow.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = (a - b) % 2^64
This takes 10 cycles.
overflowing_mulPerforms multiplication of two unsigned 64-bit integers preserving the overflow.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c_hi_hi, c_hi_lo, c_lo_hi, c_lo_lo, ...], where c = (a * b) % 2^64
This takes 18 cycles.
wrapping_mulPerforms multiplication of two unsigned 64-bit integers discarding the overflow.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = (a * b) % 2^64
This takes 11 cycles.
divPerforms division of two unsigned 64-bit integers discarding the remainder.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = a // b
This takes 54 cycles.
modPerforms modulo operation of two unsigned 64-bit integers.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = a % b
This takes 54 cycles.
divmodPerforms divmod operation of two unsigned 64-bit integers.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [r_hi, r_lo, q_hi, q_lo ...], where r = a % b, q = a // b
This takes 54 cycles.

Comparison operations

ProcedureDescription
ltPerforms less-than comparison of two unsigned 64-bit integers.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c, ...], where c = 1 when a < b, and 0 otherwise.
This takes 11 cycles.
gtPerforms greater-than comparison of two unsigned 64-bit integers.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c, ...], where c = 1 when a > b, and 0 otherwise.
This takes 11 cycles.
ltePerforms less-than-or-equal comparison of two unsigned 64-bit integers.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c, ...], where c = 1 when a <= b, and 0 otherwise.
This takes 12 cycles.
gtePerforms greater-than-or-equal comparison of two unsigned 64-bit integers.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c, ...], where c = 1 when a >= b, and 0 otherwise.
This takes 12 cycles.
eqPerforms equality comparison of two unsigned 64-bit integers.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c, ...], where c = 1 when a == b, and 0 otherwise.
This takes 6 cycles.
neqPerforms inequality comparison of two unsigned 64-bit integers.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c, ...], where c = 1 when a != b, and 0 otherwise.
This takes 6 cycles.
eqzPerforms comparison to zero of an unsigned 64-bit integer.
The input value is assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[a_hi, a_lo, ...] -> [c, ...], where c = 1 when a == 0, and 0 otherwise.
This takes 4 cycles.
minCompares two unsigned 64-bit integers and drop the larger one from the stack.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = a when a < b, and b otherwise.
This takes 23 cycles.
maxCompares two unsigned 64-bit integers and drop the smaller one from the stack.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = a when a > b, and b otherwise.
This takes 23 cycles.

Bitwise operations

ProcedureDescription
andPerforms bitwise AND of two unsigned 64-bit integers.
The input values are assumed to be represented using 32-bit limbs, but this is not checked.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = a AND b.
This takes 6 cycles.
orPerforms bitwise OR of two unsigned 64-bit integers.
The input values are expected to be represented using 32-bit limbs, and the procedure will fail if they are not.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = a OR b.
This takes 16 cycles.
xorPerforms bitwise XOR of two unsigned 64-bit integers.
The input values are expected to be represented using 32-bit limbs, and the procedure will fail if they are not.
The stack transition looks as follows:
[b_hi, b_lo, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = a XOR b.
This takes 6 cycles.
shlPerforms left shift of one unsigned 64-bit integer using the pow2 operation.
The input value to be shifted is assumed to be represented using 32-bit limbs.
The shift value should be in the range [0, 64), otherwise it will result in an error.
The stack transition looks as follows:
[b, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = a << b mod 2^64.
This takes 28 cycles.
shrPerforms right shift of one unsigned 64-bit integer using the pow2 operation.
The input value to be shifted is assumed to be represented using 32-bit limbs.
The shift value should be in the range [0, 64), otherwise it will result in an error.
The stack transition looks as follows:
[b, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = a >> b.
This takes 44 cycles.
rotlPerforms left rotation of one unsigned 64-bit integer using the pow2 operation.
The input value to be shifted is assumed to be represented using 32-bit limbs.
The shift value should be in the range [0, 64), otherwise it will result in an error.
The stack transition looks as follows:
[b, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = a << b mod 2^64.
This takes 35 cycles.
rotrPerforms right rotation of one unsigned 64-bit integer using the pow2 operation.
The input value to be shifted is assumed to be represented using 32-bit limbs.
The shift value should be in the range [0, 64), otherwise it will result in an error.
The stack transition looks as follows:
[b, a_hi, a_lo, ...] -> [c_hi, c_lo, ...], where c = a << b mod 2^64.
This takes 40 cycles.
clzCounts the number of leading zeros of one unsigned 64-bit integer.
The input value is assumed to be represented using 32 bit limbs, but this is not checked.
The stack transition looks as follows: [n_hi, n_lo, ...] -> [clz, ...], where clz is a number of leading zeros of value n.
This takes 43 cycles.
ctzCounts the number of trailing zeros of one unsigned 64-bit integer.
The input value is assumed to be represented using 32 bit limbs, but this is not checked.
The stack transition looks as follows: [n_hi, n_lo, ...] -> [ctz, ...], where ctz is a number of trailing zeros of value n.
This takes 41 cycles.
cloCounts the number of leading ones of one unsigned 64-bit integer.
The input value is assumed to be represented using 32 bit limbs, but this is not checked.
The stack transition looks as follows: [n_hi, n_lo, ...] -> [clo, ...], where clo is a number of leading ones of value n.
This takes 42 cycles.
ctoCounts the number of trailing ones of one unsigned 64-bit integer.
The input value is assumed to be represented using 32 bit limbs, but this is not checked.
The stack transition looks as follows: [n_hi, n_lo, ...] -> [cto, ...], where cto is a number of trailing ones of value n.
This takes 40 cycles.

Memory procedures

Module std::mem contains a set of utility procedures for working with random access memory.

ProcedureDescription
memcopyCopies n words from read_ptr to write_ptr.

Stack transition looks as follows:

[n, read_ptr, write_ptr, ...] -> [...]

Cycles: 15 + 16n
pipe_double_words_to_memoryMoves an even number of words from the advice stack to memory.

Input: [C, B, A, write_ptr, end_ptr, ...]
Output: [C, B, A, write_ptr, ...]

Where:
- The words C, B, and A are the RPO hasher state
- A is the capacity
- C, B are the rate portion of the state
- The value num_words = end_ptr - write_ptr must be positive and even

Cycles: 10 + 9 * num_words / 2
pipe_words_to_memoryMoves an arbitrary number of words from the advice stack to memory.

Input: [num_words, write_ptr, ...]
Output: [HASH, write_ptr', ...]

Where HASH is the sequential RPO hash of all copied words.

Cycles:
- Even num_words: 48 + 9 * num_words / 2
- Odd num_words: 65 + 9 * round_down(num_words / 2)
pipe_preimage_to_memoryMoves an arbitrary number of words from the advice stack to memory and asserts it matches the commitment.

Input: [num_words, write_ptr, COM, ...]
Output: [write_ptr', ...]

Cycles:
- Even num_words: 58 + 9 * num_words / 2
- Odd num_words: 75 + 9 * round_down(num_words / 2)

System procedures

Module std::sys contains a set of system-level utility procedures.

ProcedureDescription
truncate_stackRemoves elements deep in the stack until the depth of the stack is exactly 16. The elements are removed in such a way that the top 16 elements of the stack remain unchanged. If the stack would otherwise contain more than 16 elements at the end of execution, then adding a call to this function at the end will reduce the size of the public inputs that are shared with the verifier.
Input: Stack with 16 or more elements.
Output: Stack with only the original top 16 elements.

Design

In the following sections, we provide in-depth descriptions of Miden VM internals, including all AIR constraints for the proving system. We also provide rationale for making specific design choices.

Throughout these sections we adopt the following notations and assumptions:

  • All arithmetic operations, unless noted otherwise, are assumed to be in a prime field with modulus .
  • A binary value means a field element which is either or .
  • We use lowercase letters to refer to individual field elements (e.g., ), and uppercase letters to refer to groups of elements, also referred to as words (e.g., ). To refer to individual elements within a word, we use numerical subscripts. For example, is the first element of word , is the last element of word , etc.
  • When describing AIR constraints:
    • For a column , we denote the value in the current row simply as , and the value in the next row of the column as . Thus, all transition constraints for Miden VM work with two consecutive rows of the execution trace.
    • For multiset equality constraints, we denote random values sent by the verifier after the prover commits to the main execution trace as etc.
    • To differentiate constraints from other formulas, we frequently use the following format for constraint equations.

In the above, the constraint equation is followed by the implied algebraic degree of the constraint. This degree is determined by the number of multiplications between trace columns. If a constraint does not involve any multiplications between columns, its degree is . If a constraint involves multiplication between two columns, its degree is . If we need to multiply three columns together, the degree is ect.

The maximum allowed constraint degree in Miden VM is . If a constraint degree grows beyond that, we frequently need to introduce additional columns to reduce the degree.

VM components

Miden VM consists of several interconnected components, each providing a specific set of functionality. These components are:

  • System, which is responsible for managing system data, including the current VM cycle (clk), the free memory pointer (fmp) used for specifying the region of memory available to procedure locals, and the current and parent execution contexts.
  • Program decoder, which is responsible for computing a commitment to the executing program and converting the program into a sequence of operations executed by the VM.
  • Operand stack, which is a push-down stack which provides operands for all operations executed by the VM.
  • Range checker, which is responsible for providing 16-bit range checks needed by other components.
  • Chiplets, which is a set of specialized circuits used to accelerate commonly-used complex computations. Currently, the VM relies on 4 chiplets:
    • Hash chiplet, used to compute Rescue Prime Optimized hashes both for sequential hashing and for Merkle tree hashing.
    • Bitwise chiplet, used to compute bitwise operations (e.g., AND, XOR) over 32-bit integers.
    • Memory chiplet, used to support random-access memory in the VM.
    • Kernel ROM chiplet, used to enable calling predefined kernel procedures which are provided before execution begins.

The above components are connected via buses, which are implemented using lookup arguments. We also use multiset check lookups internally within components to describe virtual tables.

VM execution trace

The execution trace of Miden VM consists of main trace columns, buses, and virtual tables, as shown in the diagram below.

vm_trace.png

As can be seen from the above, the system, decoder, stack, and range checker components use dedicated sets of columns, while all chiplets share the same columns. To differentiate between chiplets, we use a set of binary selector columns, a combination of which uniquely identifies each chiplet.

The system component does not yet have a dedicated documentation section, since the design is likely to change. However, the following two columns are not expected to change:

  • clk which is used to keep track of the current VM cycle. Values in this column start out at and are incremented by with each cycle.
  • fmp which contains the value of the free memory pointer used for specifying the region of memory available to procedure locals.

AIR constraints for the fmp column are described in system operations section. For the clk column, the constraints are straightforward:

Programs in Miden VM

Miden VM consumes programs in a form of a Merkelized Abstract Syntax Tree (MAST). This tree is a binary tree where each node is a code block. The VM starts execution at the root of the tree, and attempts to recursively execute each required block according to its semantics. If the execution of a code block fails, the VM halts at that point and no further blocks are executed. A set of currently available blocks and their execution semantics are described below.

Code blocks

Join block

A join block is used to describe sequential execution. When the VM encounters a join block, it executes its left child first, and then executes its right child.

join_block

A join block must always have two children, and thus, cannot be a leaf node in the tree.

Split block

A split block is used to describe conditional execution. When the VM encounters a split block, it checks the top of the stack. If the top of the stack is , it executes the left child, if the top of the stack is , it executes the right child. If the top of the stack is neither nor , the execution fails.

split_block

A split block must always have two children, and thus, cannot be a leaf node in the tree.

Loop block

A loop block is used to describe condition-based iterative execution. When the VM encounters a loop block, it checks the top of the stack. If the top of the stack is , it executes the loop body, if the top of the stack is , the block is not executed. If the top of the stack is neither nor , the execution fails.

After the body of the loop is executed, the VM checks the top of the stack again. If the top of the stack is , the body is executed again, if the top of the stack is , the loop is exited. If the top of the stack is neither nor , the execution fails.

loop_block

A loop block must always have one child, and thus, cannot be a leaf node in the tree.

Dyn block

A dyn block is used to describe a node whose target is specified dynamically via the stack. When the VM encounters a dyn block, it executes a program which hashes to the target specified by the top of the stack. Thus, it has a dynamic target rather than a hardcoded target. In order to execute a dyn block, the VM must be aware of a program with the hash value that is specified by the top of the stack. Otherwise, the execution fails.

dyn_block

A dyn block must always have one (dynamically-specified) child. Thus, it cannot be a leaf node in the tree.

Call block

A call block is used to describe a function call which is executed in a user context. When the VM encounters a call block, it creates a new user context, then executes a program which hashes to the target specified by the call block in the new context. Thus, in order to execute a call block, the VM must be aware of a program with the specified hash. Otherwise, the execution fails. At the end of the call block, execution returns to the previous context.

When executing a call block, the VM does the following:

  1. Checks if a syscall is already being executed and fails if so.
  2. Sets the depth of the stack to 16.
  3. Upon return, checks that the depth of the stack is 16. If so, the original stack depth is restored. Otherwise, an error occurs.

call_block

A call block does not have any children. Thus, it must be leaf node in the tree.

Syscall block

A syscall block is used to describe a function call which is executed in the root context. When the VM encounters a syscall block, it returns to the root context, then executes a program which hashes to the target specified by the syscall block. Thus, in order to execute a syscall block, the VM must be aware of a program with the specified hash, and that program must belong to the kernel against which the code is compiled. Otherwise, the execution fails. At the end of the syscall block, execution returns to the previous context.

When executing a syscall block, the VM does the following:

  1. Checks if a syscall is already being executed and fails if so.
  2. Sets the depth of the stack to 16.
  3. Upon return, checks that the depth of the stack is 16. If so, the original stack depth is restored. Otherwise, an error occurs.

syscall_block

A syscall block does not have any children. Thus, it must be leaf node in the tree.

Span block

A span block is used to describe a linear sequence of operations. When the VM encounters a span block, it breaks the sequence of operations into batches and groups according to the following rules:

  • A group is represented by a single field element. Thus, assuming a single operation can be encoded using 7 bits, and assuming we are using a 64-bit field, a single group may encode up to 9 operations or a single immediate value.
  • A batch is a set of groups which can be absorbed by a hash function used by the VM in a single permutation. For example, assuming the hash function can absorb up to 8 field elements in a single permutation, a single batch may contain up to 8 groups.
  • There is no limit on the number of batches contained within a single span.

Thus, for example, executing 8 pushes in a row will result in two operation batches as illustrated in the picture below:

span_block_creation

  • The first batch will contain 8 groups, with the first group containing 7 PUSH opcodes and 1 NOOP, and the remaining 7 groups containing immediate values for each of the push operations. The reason for the NOOP is explained later in this section.
  • The second batch will contain 2 groups, with the first group containing 1 PUSH opcode and 1 NOOP, and the second group containing the immediate value for the last push operation.

If a sequence of operations does not have any operations which carry immediate values, up to 72 operations can fit into a single batch.

From the user's perspective, all operations are executed in order, however, the VM may insert occasional NOOPs to ensure proper alignment of all operations in the sequence. Currently, the alignment requirements are as follows:

  • An operation carrying an immediate value cannot be the last operation in a group. Thus, for example, if a PUSH operation is the last operation in a group, the VM will insert a NOOP after it.

A span block does not have any children, and thus, must be leaf node in the tree.

Program example

Consider the following program, where , etc. represent individual operations:

a_0, ..., a_i
if.true
    b_0, ..., b_j
else
    c_0, ..., c_k
    while.true
        d_0, ..., d_n
    end
    e_0, ..., e_m
end
f_0, ..., f_l

A MAST for this program would look as follows:

mast_of_program

Execution of this program would proceed as follows:

  1. The VM will start execution at the root of the program which is block .
  2. Since, is a join block, the VM will attempt to execute block first, and only after that execute block .
  3. Block is also a join block, and thus, the VM will execute block by executing operations in sequence, and then execute block .
  4. Block is a split block, and thus, the VM will pop the value off the top of the stack. If the popped value is , operations from block will be executed in sequence. If the popped value is , then the VM will attempt to execute block .
  5. is a join block, thus, the VM will try to execute block first, and then execute operations from block .
  6. Block is also a join_block, and thus, the VM will first execute all operations in block , and then will attempt to execute block .
  7. Block is a loop block, thus, the VM will pop the value off the top of the stack. If the pooped value is , the VM will execute the body of the loop defined by block . If the popped value is , the VM will not execute block and instead will move up the tree executing first block , then .
  8. If the VM does enter the loop, then after operation is executed, the VM will pop the value off the top of the stack again. If the popped value is , the VM will execute block again, and again until the top of the stack becomes . Once the top of the stack becomes , the VM will exit the loop and will move up the tree executing first block , then .

Program hash computation

Every Miden VM program can be reduced to a unique hash value. Specifically, it is infeasible to find two Miden VM programs with distinct semantics which hash to the same value. Padding a program with NOOPs does not change a program's execution semantics, and thus, programs which differ only in the number and/or placement of NOOPs may hash to the same value, although in most cases padding with NOOP should not affect program hash.

To prevent program hash collisions we implement domain separation across the variants of control blocks. We define the domain value to be the opcode of the operation that initializes the control block.

Below we denote to be an arithmetization-friendly hash function with -element output and capable of absorbing elements in a single permutation. The hash domain is specified as the subscript of the hash function and its value is used to populate the second capacity register upon initialization of control block hashing - .

  • The hash of a join block is computed as , where and are hashes of the code block being joined.
  • The hash of a split block is computed as , where is a hash of a code block corresponding to the true branch of execution, and is a hash of a code block corresponding to the false branch of execution.
  • The hash of a loop block is computed as , where is a hash of a code block corresponding to the loop body.
  • The hash of a dyn block is set to a constant, so it is the same for all dyn blocks. It does not depend on the hash of the dynamic child. This constant is computed as the RPO hash of two empty words ([ZERO, ZERO, ZERO, ZERO]) using a domain value of DYN_DOMAIN, where DYN_DOMAIN is the op code of the Dyn operation.
  • The hash of a call block is computed as , where is a hash of a program of which the VM is aware.
  • The hash of a syscall block is computed as , where is a hash of a program belonging to the kernel against which the code was compiled.
  • The hash of a span block is computed as , where is the th batch of operations in the span block. Each batch of operations is defined as containing field elements, and thus, hashing a -batch span block requires absorption steps.
    • In cases when the number of operations is insufficient to fill the last batch entirely, NOOPs are appended to the end of the last batch to ensure that the number of operations in the batch is always equal to .

Miden VM Program decoder

Miden VM program decoder is responsible for ensuring that a program with a given MAST root is executed by the VM. As the VM executes a program, the decoder does the following:

  1. Decodes a sequence of field elements supplied by the prover into individual operation codes (or opcodes for short).
  2. Organizes the sequence of field elements into code blocks, and computes the hash of the program according to the methodology described here.

At the end of program execution, the decoder outputs the computed program hash. This hash binds the sequence of opcodes executed by the VM to a program the prover claims to have executed. The verifier uses this hash during the STARK proof verification process to verify that the proof attests to a correct execution of a specific program (i.e., the prover didn't claim to execute program while in fact executing a different program ).

The sections below describe how Miden VM decoder works. Throughout these sections we make the following assumptions:

  1. An opcode requires bits to represent.
  2. An immediate value requires one full field element to represent.
  3. A NOOP operation has a numeric value of , and thus, can be encoded as seven zeros. Executing a NOOP operation does not change the state of the VM, but it does advance operation counter, and may affect program hash.

Program execution

Miden VM programs consist of a set of code blocks organized into a binary tree. The leaves of the tree contain linear sequences of instructions, and control flow is defined by the internal nodes of the tree.

Managing control flow in the VM is accomplished by executing control flow operations listed in the table below. Each of these operations require exactly one VM cycle to execute.

OperationDescription
JOINInitiates processing of a new Join block.
SPLITInitiates processing of a new Split block.
LOOPInitiates processing of a new Loop block.
REPEATInitiates a new iteration of an executing loop.
SPANInitiates processing of a new Span block.
RESPANInitiates processing of a new operation batch within a span block.
DYNInitiates processing of a new Dyn block.
CALLInitiates processing of a new Call block.
SYSCALLInitiates processing ofa new Syscall block.
ENDMarks the end of a program block.
HALTMarks the end of the entire program.

Let's consider a simple program below:

begin
    <operations1>
    if.true
        <operations2>
    else
        <operations3>
    end
end

Block structure of this program is shown below.

JOIN
    SPAN
        <operations1>
    END
    SPLIT
        SPAN
            <operations2>
        END
        SPAN
            <operations3>
        END
    END
END

Executing this program on the VM can result in one of two possible instruction sequences. First, if after operations in <operations1> are executed the top of the stack is , the VM will execute the following:

JOIN
SPAN
<operations1>
END
SPLIT
SPAN
<operations2>
END
END
END
HALT

However, if after <operations1> are executed, the top of the stack is , the VM will execute the following:

JOIN
SPAN
<operations1>
END
SPLIT
SPAN
<operations3>
END
END
END
HALT

The main task of the decoder is to output exactly the same program hash, regardless of which one of the two possible execution paths was taken. However, before we can describe how this is achieved, we need to give an overview of the overall decoder structure.

Decoder structure

The decoder is one of the more complex parts of the VM. It consists of the following components:

  • Main execution trace consisting of trace columns which contain the state of the decoder at a given cycle of a computation.
  • Connection to the hash chiplet, which is used to offload hash computations from the decoder.
  • virtual tables (implemented via multi-set checks), which keep track of code blocks and operations executing on the VM.

Decoder trace

Decoder trace columns can be grouped into several logical sets of registers as illustrated below.

decoder_trace.png

These registers have the following meanings:

  1. Block address register . This register contains address of the hasher for the current block (row index from the auxiliary hashing table). It also serves the role of unique block identifiers. This is convenient, because hasher addresses are guaranteed to be unique.
  2. Registers , which encode opcodes for operation to be executed by the VM. Each of these registers can contain a single binary value (either or ). And together these values describe a single opcode.
  3. Hasher registers . When control flow operations are executed, these registers are used to provide inputs for the current block's hash computation (e.g., for JOIN, SPLIT, LOOP, SPAN, CALL, SYSCALL operations) or to record the result of the hash computation (i.e., for END operation). However, when regular operations are executed, of these registers are used to help with op group decoding, and the remaining can be used to hold operation-specific helper variables.
  4. Register which contains a binary flag indicating whether the VM is currently executing instructions inside a span block. The flag is set to when the VM executes non-control flow instructions, and is set to otherwise.
  5. Register which keep track of the number of unprocessed operation groups in a given span block.
  6. Register which keeps track of a currently executing operation's index within its operation group.
  7. Operation batch flags which indicate how many operation groups a given operation batch contains. These flags are set only for SPAN and RESPAN operations, and are set to 's otherwise.
  8. Two additional registers (not shown) used primarily for constraint degree reduction.

Program block hashing

To compute hashes of program blocks, the decoder relies on the hash chiplet. Specifically, the decoder needs to perform two types of hashing operations:

  1. A simple 2-to-1 hash, where we provide a sequence of field elements, and get back field elements representing the result. Computing such a hash requires rows in the hash chiplet.
  2. A sequential hash of elements. Computing such a hash requires multiple absorption steps, and at each step field elements are absorbed into the hasher. Thus, computing a sequential hash of elements requires rows in the hash chiplet. At the end, we also get field elements representing the result.

To make hashing requests to the hash chiplet and to read the results from it, we will need to divide out relevant values from the chiplets bus column as described below.

Simple 2-to-1 hash

To initiate a 2-to-1 hash of elements () we need to divide by the following value:

where:

  • is a label indicating beginning of a new permutation. Value of this label is computed based on hash chiplet selector flags according to the methodology described here.
  • is the address of the row at which the hashing begins.
  • Some values are skipped in the above (e.g., ) because of the specifics of how auxiliary hasher table rows are reduced to field elements (described here). For example, is used as a coefficient for node index values during Merkle path computations in the hasher, and thus, is not relevant in this case. The term is omitted when the number of items being hashed is a multiple of the rate width () because it is multiplied by 0 - the value of the first capacity register as determined by the hasher chiplet logic.

To read the -element result (), we need to divide by the following value:

where:

  • is a label indicating return of the hash value. Value of this label is computed based on hash chiplet selector flags according to the methodology described here.
  • is the address of the row at which the hashing began.

Sequential hash

To initiate a sequential hash of elements (), we need to divide by the following value:

This also absorbs the first elements of the sequence into the hasher state. Then, to absorb the next sequence of elements (e.g., ), we need to divide by the following value:

Where is a label indicating absorption of more elements into the hasher state. Value of this label is computed based on hash chiplet selector flags according to the methodology described here.

We can keep absorbing elements into the hasher in the similar manner until all elements have been absorbed. Then, to read the result (e.g., ), we need to divide by the following value:

Thus, for example, if , the result will of the hash will be available at hasher row .

Control flow tables

In addition to the hash chiplet, control flow operations rely on virtual tables: block stack table, block hash table, and op group table. These tables are virtual in that they don't require separate trace columns. Their state is described solely by running product columns: , , and . The tables are described in the following sections.

Block stack table

When the VM starts executing a new program block, it adds its block ID together with the ID of its parent block (and some additional info) to the block stack table. When a program block is fully executed, it is removed from the table. In this way, the table represents a stack of blocks which are currently executing on the VM. By the time program execution completes, block stack table must be empty.

The table can be thought of as consisting of columns as shown below:

decoder_block_stack_table

where:

  • The first column () contains the ID of the block.
  • The second column () contains the ID of the parent block. If the block has no parent (i.e., it is a root block of the program), parent ID is 0.
  • The third column () contains a binary value which is set to is the block is a loop block, and to otherwise.

Running product column is used to keep track of the state of the table. At any step of the computation, the current value of defines which rows are present in the table.

To reduce a row in the block stack table to a single value, we compute the following.

Where are the random values provided by the verifier.

Block hash table

When the VM starts executing a new program block, it adds hashes of the block's children to the block hash table. And when the VM finishes executing a block, it removes its hash from the block hash table. Thus, by the time program execution completes, block hash table must be empty.

The table can be thought of as consisting of columns as shown below:

block_hash_table

where:

  • The first column () contains the ID of the block's parent. For program root, parent ID is .
  • The next columns () contain the hash of the block.
  • The next column () contains a binary value which is set to if the block is the first child of a join block, and to otherwise.
  • The last column () contains a binary value which is set to if the block is a body of a loop, and to otherwise.

Running product column is used to keep track of the state of the table. At any step of the computation, the current value of defines which rows are present in the table.

To reduce a row in the block hash table to a single value, we compute the following.

Where are the random values provided by the verifier.

Unlike other virtual tables, block hash table does not start out in an empty state. Specifically, it is initialized with a single row containing the hash of the program's root block. This needs to be done because the root block does not have a parent and, thus, otherwise it would never be added to the block hash table.

Initialization of the block hash table is done by setting the initial value of to the value of the row containing the hash of a program's root block.

Op group table

Op group table is used in decoding of span blocks, which are leaves in a program's MAST. As described here, a span block can contain one or more operation batches, each batch containing up to operation groups.

When the VM starts executing a new batch of operations, it adds all operation groups within a batch, except for the first one, to the op group table. Then, as the VM starts executing an operation group, it removes the group from the table. Thus, by the time all operation groups in a batch have been executed, the op group table must be empty.

The table can be thought of as consisting of columns as shown below:

decoder_op_group_table

The meaning of the columns is as follows:

  • The first column () contains operation batch ID. During the execution of the program, each operation batch is assigned a unique ID.
  • The second column () contains the position of the group in the span block (not just in the current batch). The position is -based and is counted from the end. Thus, for example, if a span block consists of a single batch with groups, the position of the first group would be , the position of the second group would be etc. (the reason for this is explained in this section). Note that the group with position is not added to the table, because it is the first group in the batch, so the first row of the table will be for the group with position .
  • The third column () contains the actual values of operation groups (this could include up to opcodes or a single immediate value).

Permutation column is used to keep track of the state of the table. At any step of the computation, the current value of defines which rows are present in the table.

To reduce a row in the op group table to a single value, we compute the following.

Where are the random values provided by the verifier.

Control flow operation semantics

In this section we describe high-level semantics of executing all control flow operations. The descriptions are not meant to be complete and omit some low-level details. However, they provide good intuition on how these operations work.

JOIN operation

Before a JOIN operation is executed by the VM, the prover populates registers with hashes of left and right children of the join program block as shown in the diagram below.

decoder_join_operation

In the above diagram, blk is the ID of the join block which is about to be executed. blk is also the address of the hasher row in the auxiliary hasher table. prnt is the ID of the block's parent.

When the VM executes a JOIN operation, it does the following:

  1. Adds a tuple (blk, prnt, 0) to the block stack table.
  2. Adds tuples (blk, left_child_hash, 1, 0) and (blk, right_child_hash, 0, 0) to the block hash table.
  3. Initiates a 2-to-1 hash computation in the hash chiplet (as described here) using blk as row address in the auxiliary hashing table and as input values.

SPLIT operation

Before a SPLIT operation is executed by the VM, the prover populates registers with hashes of true and false branches of the split program block as shown in the diagram below.

decoder_split_operation

In the above diagram, blk is the ID of the split block which is about to be executed. blk is also the address of the hasher row in the auxiliary hasher table. prnt is the ID of the block's parent.

When the VM executes a SPLIT operation, it does the following:

  1. Adds a tuple (blk, prnt, 0) to the block stack table.
  2. Pops the stack and:
    a. If the popped value is , adds a tuple (blk, true_branch_hash, 0, 0) to the block hash table.
    b. If the popped value is , adds a tuple (blk, false_branch_hash, 0, 0) to the block hash table.
    c. If the popped value is neither nor , the execution fails.
  3. Initiates a 2-to-1 hash computation in the hash chiplet (as described here) using blk as row address in the auxiliary hashing table and as input values.

LOOP operation

Before a LOOP operation is executed by the VM, the prover populates registers with hash of the loop's body as shown in the diagram below.

decoder_loop_operation

In the above diagram, blk is the ID of the loop block which is about to be executed. blk is also the address of the hasher row in the auxiliary hasher table. prnt is the ID of the block's parent.

When the VM executes a LOOP operation, it does the following:

  1. Pops the stack and:
    a. If the popped value is adds a tuple (blk, prnt, 1) to the block stack table (the 1 indicates that the loop's body is expected to be executed). Then, adds a tuple (blk, loop_body_hash, 0, 1) to the block hash table.
    b. If the popped value is , adds (blk, prnt, 0) to the block stack table. In this case, nothing is added to the block hash table.
    c. If the popped value is neither nor , the execution fails.
  2. Initiates a 2-to-1 hash computation in the hash chiplet (as described here) using blk as row address in the auxiliary hashing table and as input values.

SPAN operation

Before a SPAN operation is executed by the VM, the prover populates registers with contents of the first operation batch of the span block as shown in the diagram below. The prover also sets the group count register to the total number of operation groups in the span block.

decoder_span_block

In the above diagram, blk is the ID of the span block which is about to be executed. blk is also the address of the hasher row in the auxiliary hasher table. prnt is the ID of the block's parent. g0_op0 is the first operation of the batch, and g_0' is the first operation group of the batch with the first operation removed.

When the VM executes a SPAN operation, it does the following:

  1. Adds a tuple (blk, prnt, 0) to the block stack table.
  2. Adds groups of the operation batch, as specified by op batch flags (see here) to the op group table.
  3. Initiates a sequential hash computation in the hash chiplet (as described here) using blk as row address in the auxiliary hashing table and as input values.
  4. Sets the in_span register to .
  5. Decrements group_count register by .
  6. Sets the op_index register to .

DYN operation

Before a DYN operation is executed by the VM, the prover populates registers with as shown in the diagram below.

decoder_dyn_operation

In the above diagram, blk is the ID of the dyn block which is about to be executed. blk is also the address of the hasher row in the auxiliary hasher table. prnt is the ID of the block's parent.

When the VM executes a DYN operation, it does the following:

  1. Adds a tuple (blk, prnt, 0) to the block stack table.
  2. Gets the hash of the dynamic code block dynamic_block_hash from the top four elements of the stack.
  3. Adds the tuple (blk, dynamic_block_hash, 0, 0) to the block hash table.
  4. Initiates a 2-to-1 hash computation in the hash chiplet (as described here) using blk as row address in the auxiliary hashing table and as input values.

END operation

Before an END operation is executed by the VM, the prover populates registers with the hash of the block which is about to end. The prover also sets values in and registers as follows:

  • is set to if the block is a body of a loop block. We denote this value as f0.
  • is set to if the block is a loop block. We denote this value as f1.

decoder_end_operation

In the above diagram, blk is the ID of the block which is about to finish executing. prnt is the ID of the block's parent.

When the VM executes an END operation, it does the following:

  1. Removes a tuple (blk, prnt, f1) from the block stack table.
  2. Removes a tuple (prnt, current_block_hash, nxt, f0) from the block hash table, where if the next operation is either END or REPEAT, and otherwise.
  3. Reads the hash result from the hash chiplet (as described here) using blk + 7 as row address in the auxiliary hashing table.
  4. If (i.e., we are exiting a loop block), pops the value off the top of the stack and verifies that the value is .
  5. Verifies that group_count register is set to .

HALT operation

Before a HALT operation is executed by the VM, the VM copies values in registers to the next row as illustrated in the diagram below:

decoder_halt_operation

In the above diagram, blk is the ID of the block which is about to finish executing.

When the VM executes a HALT operation, it does the following:

  1. Verifies that block address register is set to .
  2. If we are not at the last row of the trace, verifies that the next operation is HALT.
  3. Copies values of registers to the next row.
  4. Populates all other decoder registers with 's in the next row.

REPEAT operation

Before a REPEAT operation is executed by the VM, the VM copies values in registers to the next row as shown in the diagram below.

decoder_repeat_operation

In the above diagram, blk is the ID of the loop's body and prnt is the ID of the loop.

When the VM executes a REPEAT operation, it does the following:

  1. Checks whether register is set to . If it isn't (i.e., we are not in a loop), the execution fails.
  2. Pops the stack and if the popped value is , adds a tuple (prnt, loop_body_loop 0, 1) to the block hash table. If the popped value is not , the execution fails.

The effect of the above is that the VM needs to execute the loop's body again to clear the block hash table.

RESPAN operation

Before a RESPAN operation is executed by the VM, the VM copies the ID of the current block blk and the number of remaining operation groups in the span to the next row, and sets the value of in_span column to . The prover also sets the value of register for the next row to the ID of the current block's parent prnt as shown in the diagram below:

decoder_respan_operation

In the above diagram, g0_op0 is the first operation of the new operation batch, and g0' is the first operation group of the batch with g0_op0 operation removed.

When the VM executes a RESPAN operation, it does the following:

  1. Increments block address by .
  2. Removes the tuple (blk, prnt, 0) from the block stack table.
  3. Adds the tuple (blk+8, prnt, 0) to the block stack table.
  4. Absorbs values in registers into the hasher state of the hash chiplet (as described here).
  5. Sets the in_span register to .
  6. Adds groups of the operation batch, as specified by op batch flags (see here) to the op group table using blk+8 as batch ID.

The net result of the above is that we incremented the ID of the current block by and added the next set of operation groups to the op group table.

Program decoding

When decoding a program, we start at the root block of the program. We can compute the hash of the root block directly from hashes of its children. The prover provides hashes of the child blocks non-deterministically, and we use them to compute the program's hash (here we rely on the hash chiplet). We then verify the program hash via boundary constraints. Thus, if the prover provided valid hashes for the child blocks, we will get the expected program hash.

Now, we need to verify that the VM executed the child blocks correctly. We do this recursively similar to what is described above: for each of the blocks, the prover provides hashes of its children non-deterministically and we verify that the hash has been computed correctly. We do this until we get to the leaf nodes (i.e., span blocks). Hashes of span blocks are computed sequentially from the instructions executed by the VM.

The sections below illustrate how different types of code blocks are decoded by the VM.

JOIN block decoding

When decoding a join bock, the VM first executes a JOIN operation, then executes the first child block, followed by the second child block. Once the children of the join block are executed, the VM executes an END operation. This is illustrated in the diagram below.

decoder_join_block_decoding

As described previously, when the VM executes a JOIN operation, hashes of both children are added to the block hash table. These hashes are removed only when the END operations for the child blocks are executed. Thus, until both child blocks are executed, the block hash table is not cleared.

SPLIT block decoding

When decoding a split block, the decoder pops an element off the top of the stack, and if the popped element is , executes the block corresponding to the true branch. If the popped element is , the decoder executes the block corresponding to the false branch. This is illustrated on the diagram below.

decoder_split_block_decoding

As described previously, when the VM executes a SPLIT operation, only the hash of the branch to be executed is added to the block hash table. Thus, until the child block corresponding to the required branch is executed, the block hash table is not cleared.

LOOP block decoding

When decoding a loop bock, we need to consider two possible scenarios:

  • When the top of the stack is , we need to enter the loop and execute loop body at least once.
  • When the top of the stack is, we need to skip the loop.

In both cases, we need to pop an element off the top of the stack.

Executing the loop

If the top of the stack is , the VM executes a LOOP operation. This removes the top element from the stack and adds the hash of the loop's body to the block hash table. It also adds a row to the block stack table setting the is_loop value to .

To clear the block hash table, the VM needs to execute the loop body (executing the END operation for the loop body block will remove the corresponding row from the block hash table). After loop body is executed, if the top of the stack is , the VM executes a REPEAT operation (executing REPEAT operation when the top of the stack is will result in an error). This operation again adds the hash of the loop's body to the block hash table. Thus, the VM needs to execute the loop body again to clear the block hash table.

This process is illustrated on the diagram below.

decoder_loop_execution

The above steps are repeated until the top of the stack becomes , at which point the VM executes the END operation. Since in the beginning we set is_loop column in the block stack table to , column will be set to when the END operation is executed. Thus, executing the END operation will also remove the top value from the stack. If the removed value is not , the operation will fail. Thus, the VM can exit the loop block only when the top of the stack is .

Skipping the loop

If the top of the stack is , the VM still executes the LOOP operation. But unlike in the case when we need to enter the loop, the VM sets is_loop flag to in the block stack table, and does not add any rows to the block hash table. The last point means that the only possible operation to be executed after the LOOP operation is the END operation. This is illustrated in the diagram below.

decoder_loop_skipping

Moreover, since we've set the is_loop flag to , executing the END operation does not remove any items from the stack.

DYN block decoding

When decoding a dyn bock, the VM first executes a DYN operation, then executes the child block dynamically specified by the top of the stack. Once the child of the dyn block has been executed, the VM executes an END operation. This is illustrated in the diagram below.

decoder_dyn_block_decoding

As described previously, when the VM executes a DYN operation, the hash of the child is added to the block hash table. This hash is removed only when the END operation for the child block is executed. Thus, until the child block corresponding to the dynamically specified target is executed, the block hash table is not cleared.

SPAN block decoding

As described here, a span block can contain one or more operation batches, each batch containing up to operation groups. At the high level, decoding of a span block is done as follows:

  1. At the beginning of the block, we make a request to the hash chiplet which initiates the hasher, absorbs the first operation batch ( field elements) into the hasher, and returns the row address of the hasher, which we use as the unique ID for the span block (see here).
  2. We then add groups of the operation batch, as specified by op batch flags (but always skipping the first one) to the op group table.
  3. We then remove operation groups from the op group table in the FIFO order one by one, and decode them in the manner similar to the one described here.
  4. Once all operation groups in a batch have been decoded, we absorb the next batch into the hasher and repeat the process described above.
  5. Once all batches have been decoded, we return the hash of the span block from the hasher.

Overall, three control flow operations are used when decoding a span block:

  1. SPAN operation is used to initialize a hasher and absorbs the first operation batch into it.
  2. RESPAN operation is used to absorb any additional batches in the span block.
  3. END operation is used to end the decoding of a span block and retrieve its hash from the hash chiplet.

Operation group decoding

As described here, an operation group is a sequence of operations which can be encoded into a single field element. For a field element of bits, we can fit up to operations into a group. We do this by concatenating binary representations of opcodes together with the first operation located in the least significant position.

We can read opcodes from the group by simply subtracting them from the op group value and then dividing the result by . Once the value of the op group reaches , we know that all opcodes have been read. Graphically, this can be illustrated like so:

decoder_operation_group_decoding

Notice that despite their appearance, op bits is actually separate registers, while op group is just a single register.

We also need to make sure that at most operations are executed as a part of a single group. For this purpose we use the op_index column. Values in this column start out at for each operation group, and are incremented by for each executed operation. To make sure that at most operations can be executed in a group, the value of the op_index column is not allowed to exceed .

Operation batch flags

Operation batch flags are used to specify how many operation groups comprise a given operation batch. For most batches, the number of groups will be equal to . However, for the last batch in a block (or for the first batch, if the block consists of only a single batch), the number of groups may be less than . Since processing of new batches starts only on SPAN and RESPAN operations, only for these operations the flags can be set to non-zero values.

To simplify the constraint system, the number of groups in a batch can be only one of the following values: , , , and . If the number of groups in a batch does not match one of these values, the batch is simply padded with NOOP's (one NOOP per added group). Consider the diagram below.

decoder_OPERATION_batch_flags

In the above, the batch contains operation groups. To bring the count up to , we consider the -th group (i.e., ) to be a part of the batch. Since a numeric value for NOOP operation is , op group value of can be interpreted as a single NOOP.

Operation batch flags (denoted as ), encode the number of groups and define how many groups are added to the op group table as follows:

  • (1, 0, 0) - groups. Groups in are added to the op group table.
  • (0, 1, 0) - groups. Groups in are added to the op group table
  • (0, 0, 1) - groups. Groups in is added to the op group table.
  • (0, 1, 1) - group. Nothing is added to the op group table
  • (0, 0, 0) - not a SPAN or RESPAN operation.

Single-batch span

The simplest example of a span block is a block with a single batch. This batch may contain up to operation groups (e.g., ). Decoding of such a block is illustrated in the diagram below.

decoder_single_batch_span

Before the VM starts processing this span block, the prover populates registers with operation groups . The prover also puts the total number of groups into the group_count register . In this case, the total number of groups is .

When the VM executes a SPAN operation, it does the following:

  1. Initiates hashing of elements using hash chiplet. The hasher address is used as the block ID blk, and it is inserted into addr register in the next row.
  2. Adds a tuple (blk, prnt, 0) to the block stack table.
  3. Sets the is_span register to in the next row.
  4. Sets the op_index register to in the next row.
  5. Decrements group_count register by .
  6. Sets op bits registers at the next step to the first operation of , and also copies with the first operation removed (denoted as ) to the next row.
  7. Adds groups to the op group table. Thus, after the SPAN operation is executed, op group table looks as shown below.

decoder_op_group_table_after_span_op

Then, with every step the next operation is removed from , and by step , the value of is . Once this happens, the VM does the following:

  1. Decrements group_count register by .
  2. Sets op bits registers at the next step to the first operation of .
  3. Sets hasher register to the value of with the first operation removed (denoted as ).
  4. Removes row (blk, 7, g1) from the op group table. This row can be obtained by taking values from registers: addr, group_count, and for , where and refer to values in the next row for the first hasher column and op_bits columns respectively.

Note that we rely on the group_count column to construct the row to be removed from the op group table. Since group count is decremented from the total number of groups to , to remove groups from the op group table in correct order, we need to assign group position to groups in the op group table in the reverse order. For example, the first group to be removed should have position , the second group to be removed should have position etc.

Decoding of is performed in the same manner as decoding of : with every subsequent step the next operation is removed from until its value reaches , at which point, decoding of group begins.

The above steps are executed until value of group_count reaches . Once group_count reaches and the last operation group is executed, the VM executes the END operation. Semantics of the END operation are described here.

Notice that by the time we get to the END operation, all rows are removed from the op group table.

Multi-batch span

A span block may contain an unlimited number of operation batches. As mentioned previously, to absorb a new batch into the hasher, the VM executes a RESPAN operation. The diagram below illustrates decoding of a span block consisting of two operation batches.

decoder_multi_batch_span

Decoding of such a block will look very similar to decoding of the single-span block described previously, but there also will be some differences.

First, after the SPAN operation is executed, the op group table will look as follows:

decoder_op_group_table_multi_span

Notice that while the same groups () are added to the table, their positions now reflect the total number of groups in the span block.

Second, executing a RESPAN operation increments hasher address by . This is done because absorbing additional elements into the hasher state requires more rows in the auxiliary hasher table.

Incrementing value of addr register actually changes the ID of the span block (though, for a span block, it may be more appropriate to view values in this column as IDs of individual operation batches). This means that we also need to update the block stack table. Specifically, we need to remove row (blk, prnt, 0) from it, and replace it with row (blk + 8, prnt, 0). To perform this operation, the prover sets the value of in the next row to prnt.

Executing a RESPAN operation also adds groups to the op group table, which now would look as follows:

decoder_op_group_table_post_respan

Then, the execution of the second batch proceeds in a manner similar to the first batch: we remove operations from the current op group, execute them, and when the value of the op group reaches , we start executing the next group in the batch. Thus, by the time we get to the END operation, the op group table should be empty.

When executing the END operation, the hash of the span block will be read from hasher row at address addr + 7, which, in our example, will be equal to blk + 15.

Handling immediate values

Miden VM operations can carry immediate values. Currently, the only such operation is a PUSH operation. Since immediate values can be thought of as constants embedded into program code, we need to make sure that changing immediate values affects program hash.

To achieve this, we treat immediate values in a manner similar to how we treat operation groups. Specifically, when computing hash of a span block, immediate values are absorbed into the hasher state in the same way as operation groups are. As mentioned previously, an immediate value is represented by a single field element, and thus, an immediate value takes place of a single operation group.

The diagram below illustrates decoding of a span block with operations one of which is a PUSH operation.

decoder_decoding_span_block_with_push

In the above, when the SPAN operation is executed, immediate value imm0 will be added to the op group table, which will look as follows:

decoder_imm_vale_op_group_table

Then, when the PUSH operation is executed, the VM will do the following:

  1. Decrement group_count by .
  2. Remove a row from the op group table equal to (addr, group_count, s0'), where is the value of the top of the stack at the next row (i.e., it is the value that is pushed onto the stack).

Thus, after the PUSH operation is executed, the op group table is cleared, and group count decreases to (which means that there are no more op groups to execute). Decoding of the rest of the op group proceeds as described in the previous sections.

Program decoding example

Let's run through an example of decoding a simple program shown previously:

begin
    <operations1>
    if.true
        <operations2>
    else
        <operations3>
    end
end

Translating this into code blocks with IDs assigned, we get the following:

b0: JOIN
    b1: SPAN
        <operations1>
    b1: END
    b2: SPLIT
        b3: SPAN
            <operations2>
        b3: END
        b4: SPAN
            <operations3>
        b4: END
    b2: END
b0: END

The root of the program is a join block . This block contains two children: a span bock and a split block . In turn, the split block contains two children: a span block and a span block .

When this program is executed on the VM, the following happens:

  1. Before the program starts executing, block hash table is initialized with a single row containing the hash of .
  2. Then, JOIN operation for is executed. It adds hashes of and to the block hash table. It also adds an entry for to the block stack table. States of both tables after this step are illustrated below.
  3. Then, span is executed and a sequential hash of its operations is computed. Also, when SPAN operation for is executed, an entry for is added to the block stack table. At the end of (when END is executed), entries for are removed from both the block hash and block stack tables.
  4. Then, SPLIT operation for is executed. It adds an entry for to the block stack table. Also, depending on whether the top of the stack is or , either hash of or hash of is added to the block hash table. Let's say the top of the stack is . Then, at this point, the block hash and block stack tables will look like in the second picture below.
  5. Then, span is executed and a sequential hash of its instructions is computed. Also, when SPAN operation for is executed, an entry for is added to the block stack table. At the end of (when END is executed), entries for are removed from both the block hash and block stack tables.
  6. Then, END operation for is executed. It removes the hash of from the block hash table, and also removes the entry for from the block stack table. The third picture below illustrates the states of block stack and block hash tables after this step.
  7. Then, END for is executed, which removes entries for from the block stack and block hash tables. At this point both tables are empty.
  8. Finally, a sequence of HALT operations is executed until the length of the trace reaches a power of two.

States of block hash and block stack tables after step 2: decoder_state_block_hash_2

States of block hash and block stack tables after step 4: decoder_state_block_hash_4

States of block hash and block stack tables after step 6: decoder_state_block_hash_6

Miden VM decoder AIR constraints

In this section we describe AIR constraint for Miden VM program decoder. These constraints enforce that the execution trace generated by the prover when executing a particular program complies with the rules described in the previous section.

To refer to decoder execution trace columns, we use the names shown on the diagram below (these are the same names as in the previous section). Additionally, we denote the register containing the value at the top of the stack as .

air_decoder_columns

We assume that the VM exposes a flag per operation which is set to when the operation is executed, and to otherwise. The notation for such flags is . For example, when the VM executes a PUSH operation, flag . All flags are mutually exclusive - i.e., when one flag is set to all other flags are set to . The flags are computed based on values in op_bits columns.

AIR constraints for the decoder involve operations listed in the table below. For each operation we also provide the degree of the corresponding flag and the effect that the operation has on the operand stack (however, in this section we do not cover the constraints needed to enforce the correct transition of the operand stack).

OperationFlagDegreeEffect on stack
JOIN5Stack remains unchanged.
SPLIT5Top stack element is dropped.
LOOP5Top stack element is dropped.
REPEAT4Top stack element is dropped.
SPAN5Stack remains unchanged.
RESPAN4Stack remains unchanged.
DYN5Stack remains unchanged.
CALL4Stack remains unchanged.
SYSCALL4Stack remains unchanged.
END4When exiting a loop block, top stack element is dropped; otherwise, the stack remains unchanged.
HALT4Stack remains unchanged.
PUSH4An immediate value is pushed onto the stack.

We also use the control flow flag exposed by the VM, which is set when any one of the above control flow operations is being executed. It has degree .

As described previously, the general idea of the decoder is that the prover provides the program to the VM by populating some of cells in the trace non-deterministically. Values in these are then used to update virtual tables (represented via multiset checks) such as block hash table, block stack table etc. Transition constraints are used to enforce that the tables are updates correctly, and we also apply boundary constraints to enforce the correct initial and final states of these tables. One of these boundary constraints binds the execution trace to the hash of the program being executed. Thus, if the virtual tables were updated correctly and boundary constraints hold, we can be convinced that the prover executed the claimed program on the VM.

In the sections below, we describe constraints according to their logical grouping. However, we start out with a set of general constraints which are applicable to multiple parts of the decoder.

General constraints

When SPLIT or LOOP operation is executed, the top of the operand stack must contain a binary value:

When a DYN operation is executed, the hasher registers must all be set to :

When REPEAT operation is executed, the value at the top of the operand stack must be :

Also, when REPEAT operation is executed, the value in column (the is_loop_body flag), must be set to . This ensures that REPEAT operation can be executed only inside a loop:

When RESPAN operation is executed, we need to make sure that the block ID is incremented by :

When END operation is executed and we are exiting a loop block (i.e., is_loop, value which is stored in , is ), the value at the top of the operand stack must be :

Also, when END operation is executed and the next operation is REPEAT, values in (the hash of the current block and the is_loop_body flag) must be copied to the next row:

A HALT instruction can be followed only by another HALT instruction:

When a HALT operation is executed, block address column must be :

Values in op_bits columns must be binary (i.e., either or ):

When the value in in_span column is set to , control flow operations cannot be executed on the VM, but when in_span flag is , only control flow operations can be executed on the VM:

Block hash computation constraints

As described previously, when the VM starts executing a new block, it also initiates computation of the block's hash. There are two separate methodologies for computing block hashes.

For join, split, and loop blocks, the hash is computed directly from the hashes of the block's children. The prover provides these child hashes non-deterministically by populating registers . For dyn, the hasher registers are populated with zeros, so the resulting hash is a constant value. The hasher is initialized using the hash chiplet, and we use the address of the hasher as the block's ID. The result of the hash is available rows down in the hasher table (i.e., at row with index equal to block ID plus ). We read the result from the hasher table at the time the END operation is executed for a given block.

For span blocks, the hash is computed by absorbing a linear sequence of instructions (organized into operation groups and batches) into the hasher and then returning the result. The prover provides operation batches non-deterministically by populating registers . Similarly to other blocks, the hasher is initialized using the hash chiplet at the start of the block, and we use the address of the hasher as the ID of the first operation batch in the block. As we absorb additional operation batches into the hasher (by executing RESPAN operation), the batch address is incremented by . This moves the "pointer" into the hasher table rows down with every new batch. We read the result from the hasher table at the time the END operation is executed for a given block.

Chiplets bus constraints

The decoder communicates with the hash chiplet via the chiplets bus. This works by dividing values of the multiset check column by the values of operations providing inputs to or reading outputs from the hash chiplet. A constraint to enforce this would look as , where is the value which defines the operation.

In constructing value of for decoder AIR constraints, we will use the following labels (see here for an explanation of how values for these labels are computed):

  • this label specifies that we are starting a new hash computation.
  • this label specifies that we are absorbing the next sequence of elements into an ongoing hash computation.
  • this label specifies that we are reading the result of a hash computation.

To simplify constraint description, we define the following variables:

In the above, can be thought of as initiating a hasher with address and absorbing elements from the hasher state () into it. Control blocks are always padded to fill the hasher rate and as such the (first capacity register) term is set to .

It should be noted that refers to a column in the decoder, as depicted. The addresses in this column are set using the address from the hasher chiplet for the corresponding hash initialization / absorption / return. In the case of the value of the address in column in the current row of the decoder is set to equal the value of the address of the row in the hasher chiplet where the previous absorption (or initialization) occurred. is the address of the next row of the decoder, which is set to equal the address in the hasher chiplet where the absorption referred to by the label is happening.

In the above, represents the address value in the decoder which corresponds to the hasher chiplet address at which the hasher was initialized (or the last absorption took place). As such, corresponds to the hasher chiplet address at which the result is returned.

In the above, is set to when a control flow operation that signifies the initialization of a control block is being executed on the VM. Otherwise, it is set to . An exception is made for the SYSCALL operation. Although it also signifies the initialization of a control block, it must additionally send a procedure access request to the kernel ROM chiplet via the chiplets bus. Therefore, it is excluded from this flag and its communication with the chiplets bus is handled separately.

In the above, represents the opcode value of the opcode being executed on the virtual machine. It is calculated via a bitwise combination of the op bits. We leverage the opcode value to achieve domain separation when hashing control blocks. This is done by populating the second capacity register of the hasher with the value via the term when initializing the hasher.

Using the above variables, we define operation values as described below.

When a control block initializer operation (JOIN, SPLIT, LOOP, DYN, CALL, SYSCALL) is executed, a new hasher is initialized and the contents of are absorbed into the hasher. As mentioned above, the opcode value is populated in the second capacity resister via the term.

As mentioned previously, the value sent by the SYSCALL operation is defined separately, since in addition to communicating with the hash chiplet it must also send a kernel procedure access request to the kernel ROM chiplet. This value of this kernel procedure request is described by .

In the above, is the unique operation label of the kernel procedure call operation. The values contain the root hash of the procedure being called, which is the procedure that must be requested from the kernel ROM chiplet.

The above value sends both the hash initialization request and the kernel procedure access request to the chiplets bus when the SYSCALL operation is executed.

When SPAN operation is executed, a new hasher is initialized and contents of are absorbed into the hasher. The number of operation groups to be hashed is padded to a multiple of the rate width () and so the is set to 0:

When RESPAN operation is executed, contents of (which contain the new operation batch) are absorbed into the hasher:

When END operation is executed, the hash result is copied into registers :

Using the above definitions, we can describe the constraint for computing block hashes as follows:

We need to add and subtract the sum of the relevant operation flags to ensure that when none of the flags is set to , the above constraint reduces to .

The degree of this constraint is .

Block stack table constraints

As described previously, block stack table keeps track of program blocks currently executing on the VM. Thus, whenever the VM starts executing a new block, an entry for this block is added to the block stack table. And when execution of a block completes, it is removed from the block stack table.

Adding and removing entries to/from the block stack table is accomplished as follows:

  • To add an entry, we multiply the value in column by a value representing a tuple (blk_id, prnt_id, is_loop). A constraint to enforce this would look as , where is the value representing the row to be added.
  • To remove an entry, we divide the value in column by a value representing a tuple (blk_id, prnt_id, is_loop). A constraint to enforce this would look as , where is the value representing the row to be removed.

Before describing the constraints for the block stack table, we first describe how we compute the values to be added and removed from the table for each operation. In the below, for block start operations (JOIN, SPLIT, LOOP, SPAN) refers to the ID of the parent block, and refers to the ID of the starting block. For END operation, the situation is reversed: is the ID of the ending block, and is the ID of the parent block. For RESPAN operation, refers to the ID of the current operation batch, refers to the ID of the next batch, and the parent ID for both batches is set by the prover non-deterministically in register .

When JOIN operation is executed, row is added to the block stack table:

When SPLIT operation is executed, row is added to the block stack table:

When LOOP operation is executed, row is added to the block stack table if the value at the top of the operand stack is , and row is added to the block stack table if the value at the top of the operand stack is :

When SPAN operation is executed, row is added to the block stack table:

When RESPAN operation is executed, row is removed from the block stack table, and row is added to the table. The prover sets the value of register at the next row to the ID of the parent block:

When a DYN operation is executed, row is added to the block stack table:

When END operation is executed, row is removed from the block span table. Register contains the is_loop flag:

Using the above definitions, we can describe the constraint for updating the block stack table as follows:

We need to add and subtract the sum of the relevant operation flags from each side to ensure that when none of the flags is set to , the above constraint reduces to .

The degree of this constraint is .

In addition to the above transition constraint, we also need to impose boundary constraints against the column to make sure the first and the last value in the column is set to . This enforces that the block stack table starts and ends in an empty state.

Block hash table constraints

As described previously, when the VM starts executing a new program block, it adds hashes of the block's children to the block hash table. And when the VM finishes executing a block, it removes the block's hash from the block hash table. This means that the block hash table gets updated when we execute the JOIN, SPLIT, LOOP, REPEAT, DYN, and END operations (executing SPAN operation does not affect the block hash table because a span block has no children).

Adding and removing entries to/from the block hash table is accomplished as follows:

  • To add an entry, we multiply the value in column by a value representing a tuple (prnt_id, block_hash, is_first_child, is_loop_body). A constraint to enforce this would look as , where is the value representing the row to be added.
  • To remove an entry, we divide the value in column by a value representing a tuple (prnt_id, block_hash, is_first_child, is_loop_body). A constraint to enforce this would look as , where is the value representing the row to be removed.

To simplify constraint descriptions, we define values representing left and right children of a block as follows:

Graphically, this looks like so:

air_decoder_left_right_child

In a similar manner, we define a value representing the result of hash computation as follows: