Cerebras announces six new

AI Inference Data Centers


New datacenters will catapult Cerebras to hyperscale capacity, over 40 million Llama 70B tokens per second. We are creating the largest domestic high-speed inference cloud, join us

READ MORE

Cerebras Inference is now on

All Hugging Face developers can now get one-click access to Cerebras Inference. Experience a 10x speed-up for AI chat, reasoning, and agentic apps.

READ MORE

10X Faster AI Powered insights for enterprises

With Cerebras Inference, AlphaSense has dramatically increased the speed and accuracy of its AI-driven research tools—delivering financial and business insights that are more accessible, actionable, and real-time than ever before.

LEARN MORE

Data Center Expansion

Cerebras announces six new state of the art AI data centers

Hugging Face

Cerebras is now on Hugging Face

AlphaSense

Cutting edge insights with Cerebras

Latest Announcements

Cerebras brings instant inference to Mistral Le Chat

We are excited to announce that Cerebras Inference is now powering Mistral’s Le Chat platform. Cerebras powers Le Chat’s new Flash Answers feature that provides instant responses to user queries. At over 1,100 token/s, Le Chat is 10x faster than popular models such as ChatGPT 4o, Sonnet 3.5, and DeepSeek R1, making it the world’s fastest AI assistant.

Close

Cerebras Launches World's Fastest DeepSeek R1 Llama-70B Inference

Today, we’re excited to announce the launch of DeepSeek R1 Llama-70B on Cerebras Inference. We achieve world record performance over 1,500 tokens/s on this model – 57x faster than GPU solutions. The model runs on Cerebras AI data centers in the US with no data retention, ensuring the best privacy and security for customer workloads. Users can try out the chat application on our website today. We are also offering developer preview via API – please reach out if you’re interested.

Close

Cerebras Powers Perplexity Sonar with Industry’s Fastest AI Inference

Sunnyvale, CA — February 11, 2025 – Cerebras Systems, the pioneer in accelerating generative AI, today announced its pivotal role in powering Sonar, a groundbreaking model optimized for Perplexity search. Built on the robust foundation of Llama 3.3 70B, Sonar represents a significant advancement in answer quality, factuality, and readability, setting new standards for user satisfaction in search technology. The new Sonar search experience, powered by Cerebras, is available to Perplexity Pro users starting today.

Close

Award Winning Technology

Cerebras continues to be recognized for pushing the boundaries of AI

TIME

TIME

FORBES

FORTUNE

ai model services

You bring the data, we'll train the model

Whether you want to build a multi-lingual chatbot or predict DNA sequences, our team of AI scientists and engineers will work with you and your data to build state-of-the-art models leveraging the latest AI techniques.

FIND OUT MORE

high performance computing

The fastest HPC
accelerator on earth

With 900,000 cores and 44 GB of on-chip memory, the CS-3 completely redefines the performance envelope of HPC systems. From Monte Carlo Particle Transport to Seismic Processing, the CS-3 routinely outperforms entire supercomputing installations.

FIND OUT MORE

Models on Cerebras

The Cerebras platform has trained a huge assortment of models from multi-lingual LLMs to healthcare chatbots. We help customers train their own foundation models or fine-tune open source models like Llama 2. Best of all, the majority of our work is open source.

llama 3.3

Foundation language model
8B, 70B, 405B, 15T tokens
128K context

MED42

Medical Q&A LLM
Fine-tuned from Llama2-70B
Scores 72% on USMLE

Mistral

7B Foundation Model

JAIS

Bilingual Arabic + English model
13B, 30B Parameters
Available on Azure, G42 Cloud

OPEN SOURCE
TRAINED ON CEREBRAS

starcoder

Coding LLM
15.5B parameters, 1T tokens
8K context

OPEN WEIGHTS
TRAINED ON CEREBRAS

diffusion transformer

Image generation model
33M-2B parameters
Adaptive layer norm

FALCON

Foundation language model
40B, 1T tokens,
(Uses Flash Attention and Multiquery)

T5

For NLP applications
Encoder-decoder model
60M-11B parameters

CEREBRAS-GPT

Foundational Language Model
100m - 13b parameters
NLP

OPEN SOURCE
TRAINED ON CEREBRAS

BTLM-chat

BTLM-3B-8K fine-tuned for chat
3B parameters, 8K context
Direct Preference Optimization

gigaGPT

Implements nanoGPT on Cerebras
Trains 175B+ models
565 lines of code

CRYSTALCODER

Trained for English + Code
7B Parameters, 1.3T Tokens
LLM360 Release

OPEN SOURCE
TRAINED ON CEREBRAS