- Mistral Le Chat: At over 1,100 tokens per second, making it the fastest AI assistant available today.
- Perplexity Sonar: Now delivering 1,200 tokens per second, redefining instant, high-quality AI-driven search.
- DeepSeek: We achieved 1,500 tokens per second on DeepSeek R1 70B and ensured user data remains in the US.
✨ We’re ready to celebrate, are you? Keep reading!

Join us at HumanX
AI models like DeepSeek and Llama are pushing the boundaries of scale and complexity, making AI inference the next great challenge and opportunity.
Cerebras leaders will be at HumanX in Las Vegas starting Monday, March 10th, 2025, to discuss the future of AI infrastructure and inference.
CEO Andrew Feldman will lead a panel on “Infrastructure 2.0” and also participate in the “Fastest AI Inference” roundtable.
Andy Hock and Angela Yeung will lead an AI inference masterclass – “The Future is Now: Unleashing 100x AI Inference on Llama, DeepSeek & More.”

😼 Le Chat – le Fastest AI Assistant
Cerebras Inference now powers Mistral’s Le Chat platform, one of the world’s most popular AI assistants. With millions of users relying on Le Chat daily, speed is everything. At over 1,100 token/s, Le Chat is 10x faster than popular models such as ChatGPT 4o, Sonnet 3.5, and DeepSeek R1, making it the world’s fastest AI assistant. Cerebras powers Le Chat’s new Flash Answers feature that provides instant responses to user queries.

🎉 Let’s raise a glass to the future of AI! 🎉
Join us at The Cerebras AI After, an exclusive gathering of the brightest minds in AI, as we celebrate groundbreaking innovation and the momentum we’re building together.
With record-breaking speed across AI assistants, search, and inference, we’re pushing the boundaries of what’s possible—and we couldn’t do it without this incredible community.

Cerebras Powers Perplexity’s Sonar—The Future of AI Search
We’re excited to announce that Cerebras Inference is now powering Perplexity AI’s Sonar, a groundbreaking AI search model designed to redefine how we access information.
Built on the Llama 3.3 70B foundation model, Sonar delivers up to 1,200 tokens per second, providing near-instant responses and setting a new standard for AI-powered search. With this speed and efficiency, Sonar challenges traditional search engines, offering users a seamless, next-gen search experience.

📽️ Bigger, Faster, Better:
The Cerebras Difference ⚡
Unmatched speed, the evolution of AI reasoning, and the future outlook for Cerebras.
CEO Andrew Feldman at Web Summit Qatar, watch now.
Cerebras brought together the Llama developer community for our 3rd Llamapalooza in the series (SF, NYC, Seattle). We had great talks from AWS, Cerebras, Meta, and Ollama.
We were joined by 300+ attendees, with over 800 RSVP’s. Where will we go next? Follow us on Linkedin to find out!
