Cerebras
Inference

The world’s fastest inference -
70x faster than GPU clouds,
128K context, 16-bit precision.

TRY CHATLEARN MORE

Cerebras Sets a New Record in

Molecular
Dynamics

Cerebras CS-2 achieves 1 million simulation steps/s –
700x faster than the world’s #1 supercomputer

LEARN MORE

Latest Announcements

MORE ON OUR BLOG

Cerebras Delivers Record-Breaking Performance with Meta's Llama 3.1-405B Model

Cerebras Systems today announced that it has set a new performance record for Llama 3.1-405B – a leading frontier model released by Meta AI. Cerebras Inference generated 969 output tokens per second. Data from third party benchmark firm Artificial Analysis shows Cerebras up to 75 times faster than GPU based offerings from hyperscalers.  Cerebras Inference is running multiple customer workloads on Llama’s 405B model at 128K full context length and 16-bit precision.

Close

Cerebras Sets New World Record in Molecular Dynamics at 1.1 Million Simulations per Second — 748X Faster than the World’s #1 Supercomputer ‘Frontier’

Cerebras Systems, the pioneer in accelerating generative AI, in collaboration with researchers from Sandia, Lawrence Livermore, and Los Alamos National Laboratories, have set another world record and important breakthrough in molecular dynamics (MD) simulations. For the first time in the history of the field, researchers achieved more than 1 million simulation steps per second. A single Cerebras Wafer Scale Engine achieved over 1.1 million steps per second, which is 748x faster than what is possible on the world’s leading supercomputer ‘Frontier’.

Close

Cerebras Delivers Record-Breaking Performance with Meta's Llama 3.1-405B Model

Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference: Frontier AI now runs at instant speed. Last week we ran a customer workload on Llama 3.1 405B at 969 tokens/s – a new record for Meta’s frontier model. Llama 3.1 405B on Cerebras is by far the fastest frontier model in the world – 12x faster than GPT-4o and 18x faster than Claude 3.5 Sonnet. In addition, we achieved the highest performance at 128K context length and shortest time-to-first-token latency, as measured by Artificial Analysis.

Close

Award Winning Technology

Cerebras continues to be recognized for pushing the boundaries of AI

TIME

TIME

FORBES

FORTUNE

ai model services

You bring the data, we'll train the model

Whether you want to build a multi-lingual chatbot or predict DNA sequences, our team of AI scientists and engineers will work with you and your data to build state-of-the-art models leveraging the latest AI techniques.

FIND OUT MORE

high performance computing

The fastest HPC
accelerator on earth

With 900,000 cores and 44 GB of on-chip memory, the CS-3 completely redefines the performance envelope of HPC systems. From Monte Carlo Particle Transport to Seismic Processing, the CS-3 routinely outperforms entire supercomputing installations.

FIND OUT MORE

Models on Cerebras

The Cerebras platform has trained a huge assortment of models from multi-lingual LLMs to healthcare chatbots. We help customers train their own foundation models or fine-tune open source models like Llama 2. Best of all, the majority of our work is open source.

llama 3.1

Foundation language model
8B, 70B, 405B, 15T tokens
128K context

llama 2

Foundation language model
7B-70B, 2T tokens
4K context

Mistral

7B Foundation Model
Leverages:

• Grouped-Query Attention

• Sliding Window Attention

JAIS

Bilingual Arabic + English model
13B, 30B Parameters
Available on Azure, G42 Cloud

OPEN SOURCE
TRAINED ON CEREBRAS

MED42

Medical Q&A LLM
Fine-tuned from Llama2-70B
Scores 72% on USMLE

bloom

Massive multi-lingual LLM
176B parameters, 366B tokens
2k context

OPEN SOURCE
TRAINED ON CEREBRAS

FALCON

Foundation language model
40B, 1T tokens,
(Uses Flash Attention and Multiquery)

MPT

Foundation model trained
on 1T tokens of English
that uses ALiBi positioning method

OPEN SOURCE
TRAINED ON CEREBRAS

starcoder

Coding LLM
15.5B parameters, 1T tokens
8K context

OPEN WEIGHTS
TRAINED ON CEREBRAS

diffusion transformer

Image generation model
33M-2B parameters
Adaptive layer norm

T5

For NLP applications
Encoder-decoder model
60M-11B parameters

CRYSTALCODER

Trained for English + Code
7B Parameters, 1.3T Tokens
LLM360 Release

OPEN SOURCE
TRAINED ON CEREBRAS

CEREBRAS-GPT

Foundational Language Model
100m - 13b parameters
NLP

OPEN SOURCE
TRAINED ON CEREBRAS

BTLM-chat

BTLM-3B-8K fine-tuned for chat
3B parameters, 8K context
Direct Preference Optimization

gigaGPT

Implements nanoGPT on Cerebras
Trains 175B+ models
565 lines of code

Latest Blog Posts

MORE ON OUR BLOG