Don’t Trust, Verify: An Overview of Decentralized Inference

4/16/2024, 2:08:15 AM
The intersection of blockchain and machine learning is close, but in decentralized reasoning, balancing cost and trust is a key challenge.

Say you want to run a large language model like Llama2–70B. A model this massive requires more than 140GB of memory, which means you can’t run the raw model on your home machine. What are your options? You might jump to a cloud provider, but you might not be too keen on trusting a single centralized company to handle this workload for you and hoover up all your usage data. Then what you need is decentralized inference, which lets you run ML models without relying on any single provider.

The Trust Problem

In a decentralized network, it’s not enough to just run a model and trust the output. Let’s say I ask the network to analyze a governance dilemma using Llama2–70B. How do I know it’s not actually using Llama2–13B, giving me worse analysis, and pocketing the difference?

In the centralized world, you might trust that companies like OpenAI are doing this honestly because their reputation is at stake (and to some degree, LLM quality is self-evident). But in the decentralized world, honesty is not assumed — it is verified.

This is where verifiable inference comes into play. In addition to providing a response to a query, you also prove it ran correctly on the model you asked for. But how?

The naive approach would be to run the model as a smart contract on-chain. This would definitely guarantee the output was verified, but this is wildly impractical. GPT-3 represents words with an embedding dimension of 12,288. If you were to do a single matrix multiplication of this size on-chain, it would cost about $10 billion at current gas prices — the computation would fill every block for about a month straight.

So, no. We’re going to need a different approach.

After observing the landscape, it’s clear to me that three main approaches have emerged to tackle verifiable inference: zero-knowledge proofs, optimistic fraud proofs, and cryptoeconomics. Each has its own flavor of security and cost implications.

1. Zero-Knowledge Proofs (ZK ML)

Imagine being able to prove you ran a massive model, but the proof is effectively a fixed size regardless of how large the model is. That’s what ZK ML promises, through the magic of ZK-SNARKs.

While it sounds elegant in principle, compiling a deep neural network into zero-knowledge circuits which can then be proven is extremely difficult. It’s also massively expensive — at minimum, you’re likely looking at @ModulusLabs/chapter-5-the-cost-of-intelligence-da26dbf93307">1000x cost for inference and 1000x latency (the time to generate the proof), to say nothing of compiling the model itself into a circuit before any of this can happen. Ultimately that cost has to be passed down to the user, so this will end up very expensive for end users.

On the other hand, this is the only approach that cryptographically guarantees correctness. With ZK, the model provider can’t cheat no matter how hard they try. But it does so at huge costs, making this impractical for large models for the foreseeable future.

Examples: EZKL, Modulus Labs, Giza

2. Optimistic Fraud Proofs (Optimistic ML)

The optimistic approach is to trust, but verify. We assume the inference is correct unless proven otherwise. If a node tries to cheat, “watchers” in the network can call it out the cheater and challenge them using a fraud proof. These watchers have to be watching the chain at all times and re-running the inferences on their own models to ensure the outputs are correct.

These fraud proofs are Truebit-style interactive challenge-response games, where you repeatedly bisect the model execution trace on-chain until you find the error.

If this ever actually happens it’s incredibly costly, since these programs are massive and have huge internal states — a single GPT-3 inference costs about 1 petaflop (10¹⁵ floating point operations). But the game theory suggests this should almost never happen (fraud proofs are also notoriously difficult to code correctly, since the code almost never gets hit in production).

The upside is optimistic ML is secure so long as there’s a single honest watcher who’s paying attention. The cost is cheaper than ZK ML, but remember that each watcher in the network is rerunning every query themselves. At equilibrium, this means that if there are 10 watchers, that security cost must be passed on to the user, so they will have to pay more than 10x the inference cost (or however many watchers there are).

The downside, as with optimistic rollups generally, is that you have to wait for the challenge period to pass before you’re sure the response is verified. Depending on how that network is parameterized though, you might be waiting minutes rather than days.

Examples: Ora, Gensyn (although currently underspecified)

3. Cryptoeconomics (Cryptoeconomic ML)

Here we drop all the fancy techniques and do the simple thing: stake-weighted voting. A user decides how many nodes should run their query, they each reveal their responses, and if there’s a discrepancy among responses, the odd one out gets slashed. Standard oracle stuff — it’s a more straightforward approach that lets users set their desired security level, balancing cost and trust. If Chainlink were doing ML, this is how they’d do it.

The latency here is fast — you just need a commit-reveal from each node. If this is getting written to a blockchain, then technically this can happen in two blocks.

The security however is the weakest. A majority of nodes could rationally choose to collude if they were wily enough. As a user, you have to reason about how much these nodes have at stake and what it would cost them to cheat. That said, using something like Eigenlayer restaking and attributable security, the network could effectively provide insurance in the case of a security failure.

But the nice part of this system is that the user can specify how much security they want. They could choose to have 3 nodes or 5 nodes in their quorum, or every node in the network — or, if they want to YOLO, they could even choose n=1. The cost function here is simple: the user pays for however many nodes they want in their quorum. If you choose 3, you pay 3x the inference cost.

The tricky question here: can you make n=1 secure? In a naive implementation, a lone node should cheat every time if no one is checking. But I suspect if you encrypt the queries and do the payments through intents, you might be able to obfuscate to the node that they’re actually the only one responding to this task. In that case you might be able to charge the average user less than 2x inference cost.

Ultimately, the cryptoeconomic approach is the simplest, the easiest, and probably the cheapest, but it’s the least sexy and in principle the least secure. But as always, the devil is in the details.

Examples: Ritual (although currently underspecified), Atoma Network

Why Verifiable ML is Hard

You might wonder why we don’t have all this already? After all, at bottom, machine learning models are just really large computer programs. Proving that programs were executed correctly has long been the bread and butter of blockchains.

This is why these three verification approaches mirror the ways that blockchains secure their block space — ZK rollups use ZK proofs, optimistic rollups use fraud proofs, and most L1 blockchains use cryptoeconomics. It’s no surprise that we arrived at basically the same solutions. So what makes this hard when applied to ML?

ML is unique because ML computations are generally represented as dense computation graphs that are designed to be run efficiently on GPUs. They are not designed to be proven. So if you want to prove ML computations in a ZK or optimistic environment, they have to be recompiled in a format that makes this possible — which is very complex and expensive.

The second fundamental difficulty with ML is nondeterminism. Program verification assumes that the outputs of programs are deterministic. But if you run the same model on different GPU architectures or CUDA versions, you’ll get different outputs. Even if you have to force each node to use the same architecture, you still have the problem of randomness used in algorithms (the noise in diffusion models, or token sampling in LLMs). You can fix that randomness by controlling the RNG seed. But even with all that, you’re still left with the final menacing problem: the nondeterminism inherent in floating point operations.

Almost all operations in GPUs are done on floating point numbers. Floating points are finicky because they’re not associative — that is, it’s not true that (a + b) + c is always the same as a + (b + c) for floating points. Because GPUs are highly parallelized, the ordering of additions or multiplications might be different on each execution, which can cascade into small differences in output. This is unlikely to affect the output of an LLM given the discrete nature of words, but for an image model, it may result in subtly different pixel values, leading two images to not match perfectly.

This means you either need to avoid using floating points, which means an enormous blow to performance, or you need to allow some laxity in comparing outputs. Either way, the details are fiddly, and you can’t exactly abstract them away. (This is why, it turns out, the EVM doesn’t support floating point numbers, although some blockchains like NEAR do.)

In short, decentralized inference networks are hard because all the details matter, and reality has a surprising amount of detail.

In Conclusion

Right now blockchains and ML clearly have a lot to say to each other. One is a technology that creates trust, and the other is a technology in sore need of it. While each approach to decentralized inference has its own tradeoffs, I’m very interested in seeing what entrepreneurs do with these tools to build the best network out there.

But I did not write this piece to be the last word — I’m thinking about these ideas a lot in real time and having a lot of vibrant debates with people. I’ve always found writing is the best way to test my ideas. If you’re building something in this space, reach out! I’d always love to learn what you’re working on — and if you can prove me wrong, all the better.

Disclaimer:

  1. This article is reprinted from [Dragonfly Research], All copyrights belong to the original author [Haseeb Qureshi]. If there are objections to this reprint, please contact the Gate Learn team, and they will handle it promptly.
  2. Liability Disclaimer: The views and opinions expressed in this article are solely those of the author and do not constitute any investment advice.
  3. Translations of the article into other languages are done by the Gate Learn team. Unless mentioned, copying, distributing, or plagiarizing the translated articles is prohibited.

Share

Crypto Calendar
Tokens Unlock
Grass will unlock 181,000,000 GRASS tokens on October 28th, constituting approximately 74.21% of the currently circulating supply.
GRASS
-5.91%
2025-10-27
Mainnet v.2.0 Launch
DuckChain Token will launch mainnet v.2.0 in October.
DUCK
-8.39%
2025-10-27
StVaults Launch
Lido has announced that stVaults will go live on mainnet in October as part of the Lido v.3.0 upgrade. In the meantime, users can explore the features on the testnet. The release aims to enhance Ethereum staking infrastructure through new modular vault architecture.
LDO
-5.66%
2025-10-27
AMA
Sidus will host an AMA in October.
SIDUS
-4.2%
2025-10-27
Forte Network Upgrade
Flow announces the Forte upgrade, set to launch in October, introducing tools and performance enhancements to improve developer experience and enable consumer-ready on-chain applications with AI. The update includes new features for the Cadence language, a library of reusable components, protocol improvements, and refined tokenomics. Current and new builders on Flow will release apps and upgrades leveraging the latest capabilities. Additional details will be shared on August 14 at Pragma New York ahead of the ETHGlobal hackathon.
FLOW
-2.81%
2025-10-27
sign up guide logosign up guide logo
sign up guide content imgsign up guide content img
Start Now
Sign up and get a
$100
Voucher!
Create Account

Related Articles

The Future of Cross-Chain Bridges: Full-Chain Interoperability Becomes Inevitable, Liquidity Bridges Will Decline
Beginner

The Future of Cross-Chain Bridges: Full-Chain Interoperability Becomes Inevitable, Liquidity Bridges Will Decline

This article explores the development trends, applications, and prospects of cross-chain bridges.
12/27/2023, 7:44:05 AM
Solana Need L2s And Appchains?
Advanced

Solana Need L2s And Appchains?

Solana faces both opportunities and challenges in its development. Recently, severe network congestion has led to a high transaction failure rate and increased fees. Consequently, some have suggested using Layer 2 and appchain technologies to address this issue. This article explores the feasibility of this strategy.
6/24/2024, 1:39:17 AM
Sui: How are users leveraging its speed, security, & scalability?
Intermediate

Sui: How are users leveraging its speed, security, & scalability?

Sui is a PoS L1 blockchain with a novel architecture whose object-centric model enables parallelization of transactions through verifier level scaling. In this research paper the unique features of the Sui blockchain will be introduced, the economic prospects of SUI tokens will be presented, and it will be explained how investors can learn about which dApps are driving the use of the chain through the Sui application campaign.
8/13/2025, 7:33:39 AM
Navigating the Zero Knowledge Landscape
Advanced

Navigating the Zero Knowledge Landscape

This article introduces the technical principles, framework, and applications of Zero-Knowledge (ZK) technology, covering aspects from privacy, identity (ID), decentralized exchanges (DEX), to oracles.
1/4/2024, 4:01:13 PM
What is Tronscan and How Can You Use it in 2025?
Beginner

What is Tronscan and How Can You Use it in 2025?

Tronscan is a blockchain explorer that goes beyond the basics, offering wallet management, token tracking, smart contract insights, and governance participation. By 2025, it has evolved with enhanced security features, expanded analytics, cross-chain integration, and improved mobile experience. The platform now includes advanced biometric authentication, real-time transaction monitoring, and a comprehensive DeFi dashboard. Developers benefit from AI-powered smart contract analysis and improved testing environments, while users enjoy a unified multi-chain portfolio view and gesture-based navigation on mobile devices.
5/22/2025, 3:13:17 AM
What Is Ethereum 2.0? Understanding The Merge
Intermediate

What Is Ethereum 2.0? Understanding The Merge

A change in one of the top cryptocurrencies that might impact the whole ecosystem
1/18/2023, 2:25:24 PM