Tech & Science

Xiaomi's MiMo Model Achieves 1,000 Tokens Per Second on Standard Cloud GPUs

Xiaomi's MiMo-V2.5-Pro language model reaches 1,000 tokens per second using regular cloud GPUs, surpassing previous speeds without custom hardware.

·June 9, 2026 at 11:50 AM·2 min read

Xiaomi's MiMo Model Achieves 1,000 Tokens Per Second on Standard Cloud GPUs

Xiaomi's MiMo-V2.5-Pro language model has reached a processing speed of 1,000 tokens per second through a newly introduced UltraSpeed mode, operating on standard cloud GPUs rather than specialized hardware. This performance is approximately 15 times faster than ChatGPT and represents a first at the trillion-parameter scale without dedicated silicon.

The earlier version, MiMo-V2-Flash, released in December 2025, processed about 150 tokens per second. The UltraSpeed upgrade boosts this rate to a sustained 1,000 tokens per second, with peaks near 1,200, according to Xiaomi's official announcement. This marks a 6.7-fold increase over its predecessor and exceeds current public benchmarks, where GPT-5.5 runs at around 68 tokens per second, Claude Opus at 71, and Gemini Flash at 192.

Techniques Behind the Speed Increase

The enhanced speed results from a combination of three technologies: FP4 expert quantization, which compresses model calculations while maintaining accuracy; DFlash speculative decoding, which predicts multiple tokens in parallel ahead of time; and TileRT runtime optimization, developed in collaboration with the inference startup TileRT. Xiaomi has open-sourced both the FP4-DFlash checkpoint and TileRT modules on Hugging Face and GitHub, enabling teams to self-host and independently test the system.

Cost and Use Cases of UltraSpeed Mode

Operating at three times the cost of the standard MiMo-V2.5-Pro, UltraSpeed mode is priced at approximately $1.29 per million input tokens and $2.61 per million output tokens. This pricing aligns with similar offerings from Groq but does not require proprietary hardware. Use cases highlighted include fraud detection, algorithmic trading, and real-time translation, where low latency directly impacts financial outcomes.

Trial Access and Limitations

The trial period for UltraSpeed mode runs from June 9 to June 23, 2026, with access controlled through applications prioritizing enterprise clients and professional developers demonstrating concrete use cases. Approved users receive two weeks of free access, subject to daily limits of 10 queued requests per account, 30-minute session caps, and automatic disconnection after five minutes of inactivity. The Token Plan is not compatible with UltraSpeed mode, and no specific API pricing or regional infrastructure for the US or UK has been announced beyond the trial.

Verification and Real-World Performance

All reported speed metrics derive from Xiaomi's internal benchmarks, with no independent third-party verification published to date. The availability of the open-source checkpoint on Hugging Face is expected to facilitate community testing. Xiaomi notes that acceptance rates decrease in open-ended conversational tasks compared to coding tasks, leaving the model's performance in general real-world applications to be determined.

Add Daily Beirut to your Google News feed to get the latest first.Follow

Six Paramilitary Personnel Killed and Eight Kidnapped in Peshawar Attack by Pakistani Taliban

1 hr ago

Lebanon

Techniques Behind the Speed Increase

Cost and Use Cases of UltraSpeed Mode

Trial Access and Limitations

Verification and Real-World Performance

Latest news

Six Paramilitary Personnel Killed and Eight Kidnapped in Peshawar Attack by Pakistani Taliban

Israeli Army Kills Armed Attacker Near Northern Border Fence

Public Institutions to Close on Islamic New Year

Lebanon and China Sign Investment Agreement on Cannabis and Medicinal Plants