The World's
Fastest Inference
Cerebras Inference Llama3.1-70B runs at 450 tokens/s—20x faster than GPU-based Hyperscale Clouds. Get instant responses to code-gen, summarization, and agentic tasks.
High Throughput,
Low Cost
Cerebras inference supports hundreds of concurrent users, enabling high throughput at the lowest cost.
Build with the fastest Llama-3.1-70B starting at just 60c per million token.
Full 16-Bit Precision
Cerebras inference uses the reference weights of Llama3.1 at 16-bit precision, ensuring the highest accuracy for your queries.
Unrivaled Price-Performance for AI Inference
Cerebras is the industry’s fastest inference API with the best price-performance according to Artificial Analysis.
Our Partners
"DeepLearning.AI has multiple agentic workflows that require prompting an LLM repeatedly to get a result. Cerebras has built an impressively fast inference capability which will be very helpful to such workloads."
Andrew Ng
Founder, DeepLearning AI
Andrew Ng
“For traditional search engines, we know that lower latencies drive higher user engagement and that instant results have changed the way people interact with search and with the internet. At Perplexity, we believe ultra-fast inference speeds like what Cerebras is demonstrating can have a similar unlock for user interaction with the future of search - intelligent answer engines.”
Denis Yarats
CTO and co-founder, Perplexity
Denis Yarats
“With infrastructure, speed is paramount. The performance of Cerebras Inference supercharges Meter Command to generate custom software and take action, all at the speed and ease of searching on the web. This level of responsiveness helps our customers get the information they need, exactly when they need it in order to keep their teams online and productive."
Anil Varanasi
CEO of Meter
Anil Varanasi
Hundreds of billions
of tokens per day
Cerebras Inference is built to scale. Powered by data centers across the US, Cerebras Inference has capacity to serve hundreds of billions of tokens per day with leading accuracy and reliability.
August 27, 2024
Introducing Cerebras Inference: AI at Instant Speed
Today, we are announcing Cerebras inference – the fastest AI inference solution in the world. Cerebras inference delivers 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, which is 20x faster than NVIDIA GPU-based hyperscale clouds.