Futures
Hundreds of contracts settled in USDT or BTC
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Futures Kickoff
Get prepared for your futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to experience risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Inception Labs Launches Mercury 2, Diffusion-Based Reasoning Model Achieving Over 1,000 Tokens Per Second
In Brief
Inception Labs has launched Mercury 2, a diffusion-based reasoning model capable of generating over 1,000 tokens per second, three times faster than comparable models.
Inception Labs, an AI startup, has launched Mercury 2, a diffusion-based Large Language Model (LLM) designed to significantly accelerate reasoning tasks in production AI applications
Unlike traditional autoregressive models that generate text sequentially, Mercury 2 uses a parallel refinement process, producing multiple tokens simultaneously and converging over a small number of steps, enabling speeds of over 1,000 tokens per second on NVIDIA Blackwell GPUs—approximately three times faster than competing models in the same price range.
The model is optimized for real-time responsiveness in complex AI workflows, where latency compounds across multiple inference calls, retrieval pipelines, and agentic loops. Mercury 2 maintains high reasoning quality while reducing latency, allowing developers, voice AI systems, search engines, and other interactive applications to operate at reasoning-grade performance without the delays associated with sequential generation. It supports features such as tunable reasoning, 128K token context windows, schema-aligned JSON output, and native tool integration, providing flexibility for a range of production deployments.
Mercury 2 Enables Low-Latency AI Across Coding, Voice, And Search Workflows
The report highlights several use cases where low-latency reasoning is critical. In coding and editing workflows, Mercury 2 delivers rapid autocomplete and next-edit suggestions that integrate seamlessly with developers’ thought processes. In agentic workflows, the model allows for more inference steps without exceeding latency budgets, improving the quality and depth of automated decision-making. Voice-based AI and interactive applications benefit from its ability to generate reasoning-quality responses within natural speech cadences, enhancing user experiences in real-time conversation scenarios. Additionally, Mercury 2 supports multi-hop search and retrieval pipelines, enabling rapid summarization, reranking, and reasoning without compromising response times.
Early adopters have noted significant improvements in throughput and user experience. Mercury 2 has been described as at least twice as fast as GPT-5.2 while maintaining competitive quality, with applications spanning real-time transcript cleanup, interactive human-computer interfaces, autonomous advertising optimization, and voice-enabled AI avatars.