Building Real Time Digital Twin with Cerebras at Tavus

Tavus.io is an innovative AI video research company that specializes in APIs for building digital twin video experiences. Their cutting-edge Phoenix-2 model excels in generating lifelike…

CePO: Empowering Llama with Reasoning using Test-Time Compute

Cerebras is proud to introduce CePO (Cerebras Planning and Optimization), a framework that adds sophisticated reasoning capabilities to the Llama family of models. Through test-time computation…

Announcing Cerebras Inference Research Grant

At Cerebras, we believe AI is the most transformative technology of our generation. Our mission is to accelerate AI by making it faster, easier to use, and more energy efficient, making AI accessible…

Memo-ry: Simplifying Daily Tasks for People with Memory Loss

The Challenge – Assisting Individuals with Dementia Through Task Management Millions of individuals with Alzheimer’s and dementia struggle daily to manage tasks, recall essential details,…

AIBI: Revolutionizing Interviews with AI (AI interviews aren’t really a thing yet)

AIBI (AI Bot Interviewer) is the first end-to-end AI interview bot that delivers a seamless, real-time interview experience. AIBI can conduct a realistic interview, generating high-quality, real-time…

Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference

Frontier AI now runs at instant speed. Last week we ran a customer workload on Llama 3.1 405B at 969 tokens/s – a new record for Meta’s frontier model. Llama 3.1 405B on Cerebras is by far the…

Guii: An Interactive Coding Companion for Creative Frontend Development

Guii offers a development experience similar to drawing on a canvas. By adding Guii Devtools to your codebase, you can interact directly on a webpage—selecting visual elements like boxes,…

Building an AI-Powered Search Assistant for Zoom Team Chat

Imagine a workday where all the answers you need are just a message away. No more switching between apps, no more digging through files and folders, no more endless searches. Just ask, and the…

Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s

Today we’re announcing the biggest update to Cerebras Inference since launch. Cerebras Inference now runs Llama 3.1-70B at an astounding 2,100 tokens per second – a 3x performance boost over…

Simulating Human Behavior with Cerebras

LlamaSim is a multi-LLM framework that aims to simulate human behavior at scale. Given a specific environment (e.g., voters in Pennsylvania, students at CMU), we simulate how target…

The Practitioner’s Guide to the Maximal Update Parameterization

Introduction Maximal Update Parameterization (µP) offers significant advantages for neural network training, but its adoption has been limited due to the complexity of the underlying math and the…