Cerebras
Inference
The world’s fastest inference -
70x faster than GPU clouds,
128K context, 16-bit precision.
Cerebras Sets a New Record in
Molecular
Dynamics
Cerebras CS-2 achieves 1 million simulation steps/s –
700x faster than the world’s #1 supercomputer
Cerebras Delivers Record-Breaking Performance with Meta's Llama 3.1-405B Model
Cerebras Systems today announced that it has set a new performance record for Llama 3.1-405B – a leading frontier model released by Meta AI. Cerebras Inference generated 969 output tokens per second. Data from third party benchmark firm Artificial Analysis shows Cerebras up to 75 times faster than GPU based offerings from hyperscalers. Cerebras Inference is running multiple customer workloads on Llama’s 405B model at 128K full context length and 16-bit precision.
Cerebras Sets New World Record in Molecular Dynamics at 1.1 Million Simulations per Second — 748X Faster than the World’s #1 Supercomputer ‘Frontier’
Cerebras Systems, the pioneer in accelerating generative AI, in collaboration with researchers from Sandia, Lawrence Livermore, and Los Alamos National Laboratories, have set another world record and important breakthrough in molecular dynamics (MD) simulations. For the first time in the history of the field, researchers achieved more than 1 million simulation steps per second. A single Cerebras Wafer Scale Engine achieved over 1.1 million steps per second, which is 748x faster than what is possible on the world’s leading supercomputer ‘Frontier’.
Cerebras Delivers Record-Breaking Performance with Meta's Llama 3.1-405B Model
Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference: Frontier AI now runs at instant speed. Last week we ran a customer workload on Llama 3.1 405B at 969 tokens/s – a new record for Meta’s frontier model. Llama 3.1 405B on Cerebras is by far the fastest frontier model in the world – 12x faster than GPT-4o and 18x faster than Claude 3.5 Sonnet. In addition, we achieved the highest performance at 128K context length and shortest time-to-first-token latency, as measured by Artificial Analysis.