Cerebras
Inference
The world’s fastest inference -
70x faster than GPU clouds,
128K context, 16-bit precision.
Latest Announcements
CePO: Empowering Llama with Reasoning using Test-Time Compute
Cerebras is proud to introduce CePO (Cerebras Planning and Optimization), a framework that adds sophisticated reasoning capabilities to the Llama family of models. Through test-time computation techniques, we enable Llama to tackle complex reasoning tasks with unprecedented accuracy. While models like OpenAI o1 and Alibaba QwQ have shown how additional computation at inference time can dramatically improve problem-solving capabilities [1], we’re now bringing these advances to Llama – the world’s most popular open-source LLM.
Cerebras Demonstrates Trillion Parameter Model Training on a Single CS-3 System
SUNNYVALE, CA AND VANCOUVER — December 10, 2024 – Today at NeurIPS 2024, Cerebras Systems, the pioneer in accelerating generative AI, today announced a groundbreaking achievement in collaboration with Sandia National Laboratories: successfully demonstrating training of a 1 trillion parameter AI model on a single CS-3 system. Trillion parameter models represent the state of the art in today’s LLMs, requiring thousands of GPUs and dozens of hardware experts to perform. By leveraging Cerebras’ Wafer Scale Cluster technology, researchers at Sandia were able to initiate training on a single AI accelerator – a one-of-a-kind achievement for frontier model development.
Cerebras Delivers Record-Breaking Performance with Meta's Llama 3.1-405B Model
Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference: Frontier AI now runs at instant speed. Last week we ran a customer workload on Llama 3.1 405B at 969 tokens/s – a new record for Meta’s frontier model. Llama 3.1 405B on Cerebras is by far the fastest frontier model in the world – 12x faster than GPT-4o and 18x faster than Claude 3.5 Sonnet. In addition, we achieved the highest performance at 128K context length and shortest time-to-first-token latency, as measured by Artificial Analysis.