Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
A paper "smashing" a bunch of blue-chip stocks: the higher the efficiency, the greater the demand?
【Related Reading】A Google Paper Triggers a Crash in the Storage Chip Market! AI Memory Demand Plummets 6 Times, Inference Surges 8 Times
On March 26, global storage chip markets experienced panic due to an academic paper.
Stock prices of storage chip companies came under pressure across the board. At the close of the A-share market on March 26, Hengshuo shares fell over 6%, while Zhaoyi Innovation, Baiwei Storage, and Langke Technology dropped over 5%. Stocks like Jiangbolong and Beijing Junzheng also followed suit. When the U.S. stock market opened on March 26, the storage chip sector saw widespread declines; as of 10:30 PM Beijing time, SanDisk fell over 6%, Micron Technology and Western Digital dropped over 4%, and Seagate Technology fell over 3%.
The catalyst for this fluctuation came from a paper set to be officially presented at the International Conference on Learning Representations (ICLR 2026) by Google Research. Google’s new AI memory compression technology, “TurboQuant,” claims to reduce cache memory usage in large language model (LLM) inference to one-sixth and achieve up to 8 times performance acceleration on NVIDIA H100 GPUs.
Currently, the capital market interprets this as a fatal blow to storage hardware demand, but what will be the long-term real impacts behind this panic selling?
“Pied Piper” Comes to Reality
What problem does TurboQuant actually solve?
One of the core bottlenecks during the operation of large models is the “key-value cache” (KV Cache). Simply put, when a user interacts with an AI, the model needs to remember previous conversations (context), and this temporary storage of data is the KV Cache. As the context window of large models expands from thousands of tokens to millions or even tens of millions, the consumption of KV Cache memory grows exponentially, becoming a critical factor in limiting inference costs.
The reporter reviewed the paper, and TurboQuant is essentially an extreme quantization compression algorithm. Traditional quantization methods require a trade-off between compression accuracy and additional storage overhead, whereas the Google team has achieved “zero loss” compression of KV Cache to 3-bit precision through two innovations: PolarQuant (polar coordinate quantization) and QJL (quantization JL transformation).
Industry insiders have compared this breakthrough to the fictional startup Pied Piper in the HBO classic series “Silicon Valley,” which disrupted the industry with its “lossless compression algorithm.” The CEO of Cloudflare even referred to it as Google’s “DeepSeek moment,” believing it could significantly reduce AI operational costs through extreme efficiency, much like DeepSeek.
“Reflexive” Selling
For a capital market long immersed in the narrative of “computing power equals power” and “storage power equals national power,” the emergence of this technology undoubtedly touches sensitive nerves.
If the memory throughput efficiency of a single graphics card is multiplied, will the physical procurement volume of DRAM and HBM by major cloud service providers and corporate clients plummet? This line of reasoning directly led to risk-averse behavior among investors.
This is not the first technical panic for storage chip stocks. In early 2025, when DeepSeek released a low training cost model, it also raised doubts about the demand for computing hardware. TurboQuant is viewed as a continuation of the same logic. “Software replacing hardware” is transforming from story to reality.
However, amid the frenzy in the tech circle and the selling in the secondary market, Wall Street investment banks have shown a degree of calm.
Morgan Stanley explicitly stated in its latest research report that the market has misinterpreted this. The technology only affects the key-value cache during the inference stage and does not impact the high-bandwidth memory (HBM) used by model weights, nor is it related to AI training tasks.
Analysts emphasized that the so-called “6 times compression” does not mean a decrease in total storage demand but rather an increase in throughput per single GPU through improved efficiency. This means that under the same hardware conditions, it can support 4 to 8 times longer contexts or significantly increase batch sizes without triggering memory overflow.
Analysts from Lynx Equity Strategies further stated that media reports contain exaggerations. Current inference models have already widely adopted 4-bit quantized data, and Google’s so-called “8 times performance improvement” is based on comparisons with outdated 32-bit models.
Moreover, the current validation range for TurboQuant is relatively limited. Tian Feng, director of the Fast and Slow Thinking Research Institute and guest commentator, told reporters that the technology has only been validated on open-source models like Gemma and Mistral, and the adaptation effects on Google’s core models like Gemini have not yet been disclosed; the universality of the technology still needs observation.
It is noteworthy that compressing the KV cache and optimizing for long contexts is not a new technical idea. As early as April 2025, Google had publicly released related papers on TurboQuant.
In similar technical approaches, there are also related layouts domestically. For instance, KimiLinear from Dark Side of the Moon can reduce KV cache usage by up to 75% compared to traditional full attention models when handling long context tasks; the MLA method proposed by DeepSeek V2 can also optimize KV cache.
Jevons Paradox: The Higher the Efficiency, the Greater the Demand?
In addition to the potential misinterpretation of technical details, it is also necessary to reassess the long-term impact of TurboQuant from an economic perspective.
From a supply chain perspective, manufacturers are currently running at full capacity. Currently, server memory demand continues to grow, with server DRAM demand expected to increase by 39% in 2026 and HBM demand growing 58% annually; the optimization effect of TurboQuant may be drowned out by the industry’s growth wave.
“This will be another example of Jevons Paradox,” Fang Haisheng, chairman of Infinite Stars, told the Shanghai Securities News. The improvement in technical efficiency often reduces usage costs, thus triggering a larger total demand. The increased efficiency of steam engines did not reduce coal consumption; instead, it drove explosive growth in coal demand, a principle that is also applicable in the AI era.
Although TurboQuant directly impacts the memory cost curve of AI systems, historical experience shows that the existence of compression algorithms has never fundamentally changed the overall scale of hardware procurement. By significantly reducing the service cost of a single query, such technologies can enable models that previously could only run on expensive cloud clusters to migrate to local environments, effectively lowering the barriers to scalable AI deployment, thereby activating more application scenarios that were previously constrained by costs.
“The focus of inference costs will shift from GPUs to storage optimization, significantly reducing TCO (total cost of ownership). This will also enable small and medium-sized enterprises to further participate in AI application innovation, breaking down the technological barriers of large companies and accelerating the democratization of AI,” Tian Feng stated.
An unpublished paper has triggered violent fluctuations in the global storage chip sector, which in itself is enough to illustrate the fragility and sensitivity of current AI infrastructure investment logic.
As of the time of publication, Google has not announced a specific deployment timeline for TurboQuant in models like Gemini. Discussions about this technology will continue to evolve at the ICLR 2026 conference in April. Our reporter will keep following this matter.
(Source: Shanghai Securities News)