So your compiler rolls dice

How I made Vyper’s nondeterministic era verifiable again, two compiles at a time

vyper
compilers
data
Author

banteg

Published

June 11, 2026

In June 2023, the engineer who runs contract verification at Etherscan filed a bug against Vyper1: the same source, the same compiler version, the same settings produced different bytecode depending on where you ran it. He had a freshly deployed Curve contract that refused to verify. He tried pinning PYTHONHASHSEED. It did nothing. The thread ends with Curve noting that verification “will continue to fail.”

The bug was fixed in 0.3.82 and never backported. No 0.3.7.1 was released, and nobody published a recipe for verifying what was already on chain. My inventory counts roughly 2,200 mainnet contracts deployed with the affected compilers — 2,045 of them on 0.3.7 alone, including crvUSD and a good slice of Curve’s stableswap-ng universe. Whatever bytecode the deployer’s machine happened to produce that day is the bytecode the verifier has to reproduce, and there was no documented way to do it.

This is the story of how I chased the bug through heap layouts and PyInstaller binaries, two failed search strategies, one very satisfying theorem about topological sorts, and the tool that came out the other end: vysort.

Heap roulette

The root cause is two lines apart from being boring. ContractFunction objects track their callees in a plain set()3, and neither ContractFunction nor any of its bases defines __hash__. Python then hashes these objects by id() — their memory addresses. Iteration order of the set depends on where the allocator happened to place your objects.

  • 3 vyper/semantics/types/function.py:126 at v0.3.7

  • That’s why PYTHONHASHSEED does nothing: seed randomization applies to strings and bytes, not to identity hashing. The reporter’s instinct was right, but the dice are rolled by the heap, not the hash seed.

    The set order leaks into the output in exactly two places, both in vyper/codegen/module.py:

    1. The call-graph topological sort walks called_functions during DFS, and its output decides the order of internal function sections in the runtime code. Every jump target after the first swapped section shifts with it.
    2. The deploy code appends internal functions called from __init__ by iterating the raw set. Creation bytecode gets permuted the same way.

    Everything else is downstream determinism: frame allocation takes a max() over callee frame sizes, so memory offsets don’t care about order. The entire divergence is one permutation of code sections, plus the PUSH2 jump targets that reference them.

    The bug shipped in 0.3.44 — before that, module.py had no call-graph topsort at all and emitted internal functions in plain source order. Versions 0.3.4 through 0.3.7 roll dice. 0.3.8 replaced the set with an OrderedSet.

  • 4 Introduced in #2496, fix: call internal functions from constructor, commit 4b44ee7, released May 2022.

  • How bad is it, actually

    Before fixing anything I wanted to know the shape of the randomness, so I compiled the contract from the original issue — CurveTricryptoOptimizedWETH, 21 internal functions — about everywhere I could:

    environment result
    pip vyper, macOS, py3.10, 63 runs deterministic within the environment
    pip vyper, macOS, py3.8 / 3.9 / 3.10 / 3.11 4 different bytecodes, one per python
    PYTHONHASHSEED=0 no effect, as advertised
    official darwin binary (PyInstaller), 16 runs 6 distinct outputs, nondeterministic per run
    official linux binary — what sourcify runs — 6 runs 2 distinct outputs, nondeterministic per run
    docker image vyperlang/vyper:0.3.7 yet another ordering

    About fifteen distinct orderings of the same contract, and counting. The detail that ended any hope of an “environment matrix” strategy: the PyInstaller binaries are nondeterministic per run. A CPython that imports the same modules in the same order tends to lay out its heap reproducibly; the frozen binaries apparently don’t. The linux binary — the one both deployers and sourcify actually use — flakes between two outputs on consecutive invocations. You cannot whitelist your way through this; even currently-verifiable contracts occasionally fail their recompile and nobody notices because a retry fixes it.

    Searching for the order, badly

    Reproducing a deployed contract means finding the permutation of internal functions the deployer’s heap happened to produce. For \(k\) internal functions that’s potentially \(k!\) — fine for the 4-function factory coming up later (\(24\) candidates), hopeless for tricrypto (\(21! \approx 5 \times 10^{19}\)).

    First idea: greedy prefix search. Fix the layout position by position, keep whichever candidate maximizes the common bytecode prefix with the target. This failed in an instructive way: the external/selector section near the start of the runtime contains PUSH2 references to every internal function’s address. The common prefix saturates at the first reference to any misplaced function and carries no positional signal whatsoever. After 232 compiles the prefix sat at 635 bytes of 23,701 and refused to move.

    So searching with the compiler as a black box doesn’t work. Time to open the box.

    Read it off the chain

    The realization that unlocked everything: I’m running the compiler as a library, so I don’t have to guess the layout — I can instrument one compile and then decode the deployed order directly off the on-chain bytes.

    The mechanics, in three steps:

    1. One instrumented compile. Replicate the assembler’s program-counter walk over the runtime assembly. Each internal function announces its entry label, which gives exact section boundaries. Every code-symbol PUSH2 operand position is also known, which gives the complete set of layout-dependent bytes — by construction, not by diffing probes, so nothing can be under-masked. Mask only the operands whose target sits inside the movable internal region; references to fixed targets must match as-is.
    2. Decode, zero compiles. Walk the on-chain internal region with a cursor. At each position, exactly one remaining function’s masked section matches. Section lengths don’t change with order, so the walk terminates with the full deployed permutation. Functions with identical bodies are mutually ambiguous, but any consistent assignment produces identical bytecode, so it doesn’t matter.
    3. One verification compile with the recovered order forced in, compared byte-for-byte against the chain.

    Two compiles, regardless of how many internal functions or how large the permutation space. Tricrypto — targeting an ordering produced by a random official-binary run that no tested environment could reproduce — decoded and verified byte-exact, CBOR auxdata included, in 7.6 seconds.

    The eureka

    While validating the decoder against an 85-contract mainnet corpus, something nagged me. Most 0.3.7 contracts verify on the first try, in any environment. If the compiler rolls dice, why does it keep rolling the same numbers for most contracts?

    The hypothesis: nondeterminism requires more than one valid topological sort. Stare at the topsort long enough and it sharpens into something precise. The outer loop walks all function definitions in source order; the DFS only consults a called_functions set when it reaches a caller. So any internal function defined before its caller is already placed by the outer loop by the time the caller’s set is iterated — deduplication makes the set’s order irrelevant for it. The dice only matter at a decision point: a set containing two or more internal functions that are not yet placed when the DFS arrives.

    I wrote a simulator that captures the real call graph from a compile, then enumerates every possible per-set iteration order and counts distinct reachable layouts. Validated against the deployed layouts decoded from the corpus:

    • 73 of 85 contracts have zero decision points. One valid topsort, one reachable layout, bit-for-bit deterministic in every environment. These contracts were never at risk — that, not luck, is why most contracts verify.
    • Every contract whose deployed layout differed from a fresh compile sat in the multi-layout pool, and every decoded layout was inside the predicted reachable set. Zero counterexamples.
    • Function count predicts nothing. A 28-internal-function contract is immune; a 3-function one is not. The risk factor is source style: externals first, internal helpers at the bottom — the Curve house style — is exactly what manufactures unplaced-callee sets.

    And then the inverse hit me. If definition order is what makes a contract immune, definition order can make any contract immune — in whichever layout I want. Move the internal function definitions to the top of the file, ordered by the target layout. The outer loop now places every one of them before any caller’s set is ever consulted. Zero decision points. One reachable layout. On any stock compiler, in any environment, every run. And since 0.3.7’s auxdata contains no source hash — only the compiler version — the output is byte-identical to what the original source produces under that ordering.

    No compiler fork. No patched binaries. The bug ships with its own antidote: the same source-order rule that creates the hazard pins it down.

    I checked the brutal way: the reordered tricrypto compiled to an exact match three runs out of three on the official PyInstaller binary — the same binary that is nondeterministic per run on the original source.

    Counting the wounded

    With a cheap static classifier — capture the call graph, count decision points, enumerate reachable layouts, no extra compiles — I swept every source-available 0.3.4–0.3.7 mainnet contract in my corpus: 376 unique runtimes across 772 deployments.

    version unique runtimes at-risk
    0.3.4 21 0
    0.3.5 1 0
    0.3.6 20 3
    0.3.7 334 23
    total 376 26 (7%)

    93% of runtimes are provably immune — single valid topsort, verifiable today with one stock compile. Of the 26 at-risk ones, 25 have at most 24 reachable layouts, small enough to brute-force by reordering and recompiling per candidate. Exactly one contract laughs at brute force: yETH’s weighted stableswap pool5, whose 6 decision points compound into a choice space of \(6.6 \times 10^{49}\) with at least 4,988 reachable layouts in a 20,000-sample probe.

  • 5 0x2cced4ff…, 23 internal functions.

  • That one is why the decoder matters.

    vysort

    Everything above is now packaged as a CLI. The amusing operational detail: vyper 0.3.7 requires python ≤3.11, which you probably don’t have lying around. vysort runs on any modern python and depends only on uv — every compiler-touching operation runs in an ephemeral environment with the right python and the right vyper, resolved from the version pragma in your source. The first run for a given version costs a few seconds of environment setup; after that it’s all cache.

    uvx vysort check Contract.vy
    {
      "vyper": "0.3.7",
      "affected": true,
      "internal_fns": 4,
      "decision_points": 1,
      "reachable_layouts": 6,
      "immune": false,
      "layouts": [["_pack_protocol_fee_data", "_unpack_custom_flag", ...], ...]
    }

    check is the classifier: it tells you whether a contract was ever at risk, and if so, enumerates every layout the deployer’s heap could have produced. Sources targeting unaffected compilers short-circuit to immune without compiling anything.

    uvx vysort match Contract.vy --address 0x2cced4ff… --rpc-url $RPC -o matched.vy
    {
      "status": "prefix",
      "method": "recovery",
      "compiles": 2,
      "immutable_tail_bytes": 32,
      "layout": ["_pack_weight", "_pack_vb", "__log", "_log", ...]
    }

    match is the decoder: it recovers the deployed layout from the on-chain bytes (or a hex file), then proves it with one reordered stock compile. exact means byte-for-byte; prefix means the deployed code carries an appended immutable tail after a byte-exact prefix — the legacy way vyper handled immutables. If the decode ever hits an edge case it can’t handle, it degrades to brute-forcing the reachable layouts, which covers everything except the yETH pool. The matched source it writes out is ordinary Vyper that any stock 0.3.7 compiles to the deployed bytecode, forever.

    uvx vysort verify Contract.vy --address 0x… --rpc-url $RPC

    verify is the full rescue pipeline: fetch the code and chain id, decode the layout, rewrite the source, preflight the exact standard-json payload through vyper’s own std-json entry point to confirm it reproduces the runtime, and submit it to sourcify’s v2 API. The reordered source sails through a completely unmodified verifier with a completely unmodified compiler.

    The worst cases on mainnet

    I ran vysort match over the full at-risk population — all 26 runtimes, deployed layouts decoded from their on-chain bytecode:

    runtime vyper internal fns reachable layouts result
    dcc516f0 (yETH pool) 0.3.7 23 ≥4,988 of \(6.6 \times 10^{49}\) prefix + 32B tail
    9210071d 0.3.7 7 24 exact
    63937efe 0.3.7 17 16 exact
    002de55c … (4 runtimes) 0.3.7 5 12 exact
    5eb9fa91 0.3.7 12 8 prefix + 160B tail
    bfbd5c0b (Yearn vault factory) 0.3.7 4 6 prefix + 32B tail
    15 more 0.3.6–0.3.7 2–13 2–6 all matched

    26 of 26, all through the two-compile decode, zero brute-force fallbacks, zero failures. 18 exact, 8 with immutable tails.

    The sweep also keeps teaching me things. The very first batch of new contracts surfaced a decoder edge case the original 85-contract validation never hit: a contract with a dead internal function — defined, never called, never emitted. No entry label, no cleanup label, nothing to anchor. The decoder now classifies such functions as absent and leaves them out of the proof order; the final byte-exact comparison keeps everyone honest. This is also why the brute-force fallback stays in the codebase: every validation round so far has found a new way for real-world bytecode to be weird.

    Did it actually rescue anything?

    The Yearn vault factory was the perfect end-to-end test: five mainnet deployments, fully unverified, one decision point, six reachable layouts — and the deployed ordering reproducible in no stock environment I tested. The deployer’s heap rolled a layout that the binary distribution apparently never produces. Layout-recovered, reordered, submitted to production sourcify: creationMatch for all five deployments, stock 0.3.7+commit.6020b8bb, zero changes on sourcify’s side.

    crvUSD, for the record, is immune — zero decision points, as is its Controller with 16 internal functions. Of the flagship Curve 0.3.7 contracts only tricrypto rolls dice. The universe has a sense of humor about which contracts it endangers.

    Sharp edges

    Two caveats earned their place in the README the hard way.

    First, the constructor lottery. Source reordering forces the runtime topsort, but the deploy-code section for __init__-called internals iterates the raw set directly — no outer-loop protection. Runtime matches are always forceable; creation matches are only guaranteed when the constructor calls at most one internal function. Beyond that you’re back to a (much smaller) environment lottery, or to decoding the init-callee order the same way vysort decodes the runtime.

    Second, settings masquerading as nondeterminism. One of my two original prefix-drift contracts turned out not to be the ordering bug at all: it diverged at the same byte in every environment, because it was compiled for pre-Berlin EVM and the nonreentrant lock constants differ (0/1 vs 3/2). --evm-version istanbul and it fell into place. When hunting compiler nondeterminism, first make sure the compiler isn’t being deterministic about different inputs.

    Coda

    A compiler bug that blocks verification doesn’t stop being a problem when it’s fixed in the next release. The contracts it touched are immortal, the binaries that produced them still roll dice, and “recompile and pray” was the state of the art for three years. It turns out the whole mess collapses into two compiles and a source-level trick the compiler itself suggests, once you stop searching the permutation space and start reading the answer off the chain.

    The tool is at github.com/banteg/vysort, and uvx vysort check on your dusty 0.3.x source will tell you in a couple of seconds whether you were ever in the blast radius. Odds are 93% you weren’t. The other 7% is why it exists.