Saumya Satish, Sr Product Manager, ML Software | April 13, 2022
The Cerebras team is extremely excited to announce the release of version 1.2 of the Cerebras Software Platform, CSoft. In this release we provide expanded support for the popular machine learning framework PyTorch and deliver our groundbreaking “Weight Streaming” execution mode to give users early access to high performance on extreme-scale models with billions or even trillions of parameters.
The Cerebras CS-2 system is powered by the Wafer-Scale Engine (WSE-2), the fastest AI processor ever built, packing the computing power of hundreds of GPUs into a single machine that is as easy to program as a laptop. By coupling that performance with the developer-friendly features PyTorch, CSoft 1.2 gives our large and growing PyTorch developer community a powerful set of tools to make new breakthroughs in AI.
Expanded PyTorch Support
In CSoft 1.2, we extended support for PyTorch 1.10.2 using a lazy tensor backend with XLA to capture and optimally map full model graphs onto the WSE-2. For users, our PyTorch interface uses simple custom class APIs that let developers easily run their models on the CS-2, with only a few lines of code. This release supports training, eval, and fine-tuning for a rich set of Natural Language Processing (NLP) models such as BERT and its variants, GPT2, Transformers, T5, and more with orders of magnitude higher performance than clusters of traditional hardware.
To peek inside our approach to PyTorch integration, read this blog by Emad Barsoum, Cerebras’ Senior Director of AI Frameworks. In that post, Emad explores the choices we made to achieve full performance without sacrificing the user-friendliness of PyTorch and he dives into the inner workings of our compiler.
For a demonstration of how easy it is to run your PyTorch code on a Cerebras system, read this BERT code walk through and watch this short video by ML solutions engineer Cindy Orozco.
Unlocking Extreme Scale AI with Weight Streaming
At the HotChips 2021 conference, Cerebras presented a new paradigm for giant model training, called “Weight Streaming”, based on the disaggregation of model storage and compute. We also described the architecture of this paradigm using Cerebras wafer-scale systems. This technology unlocks unique flexibility, allowing independent scaling of the model size and the training performance. For example, a single CS-2 system can support models up to 120 trillion parameters. And, to accelerate training, we can cluster up to 192 systems with near-linear performance scaling.
You can watch Sean Lie, our co-founder and chief hardware architect, talk about the motivation for weight streaming and our implementation in this video.
Today, CSoft 1.2 brings this new AI capability to users for the first time, enabling a seamless scaling path from models such as BERTLARGE and Transformer to GPT-3, GPT-J, Megatron and more. With today’s release, we have added support for training the GPT3-XL model, which has 1.3 billion parameters, on a single CS-2 system.
We are excited about this work because it opens the door for our users and customers to build completely new applications that are intractable today because they simply take too long to train to be useful, even using supercomputer-sized GPU clusters. Models like GPT-3 and GPT-J can be seamlessly applied for to many disparate domains, such as answering medical questions, writing poetry, creating high-quality code and summarizing long documents.
We are also excited about the potential of weight streaming to unlock new capabilities in computer vision applications, where we can harness the power of wafer-scale to perform real-time image detection and classification on extremely high-resolution video streams – something that is intractable today on conventional systems.
The team at Cerebras is hard at work to make models like these very practical and accessible for our ML researchers and practitioners to make the impossible, possible.
New Public Documentation Site with Sample Models
Along with this release, for the very first time, we have also published a treasure trove of documentation and a public GitHub repository with several reference AI model implementations to reach more users and show developers just how easy it is to get up and running on the CS-2. Contact us to get on board the revolutionary CS-2 and kick the tires on these new features – we look forward to working together to help you and your teams accelerate the next generation of AI applications.
Resources
Documentation
- For a complete list of features released in R1.2, please refer to our Release Notes.
- Developer documentation, including code workflow tutorials
- Cerebras Reference Implementations
Articles
- Supporting PyTorch on the Cerebras Wafer-Scale Engine
- Getting Started with PyTorch BERT Models on the Cerebras CS-2 System
- Training Giant Neural Networks Using Weight Streaming on Cerebras Wafer-Scale Systems
Video
Contact us to arrange a demo here.
Related Posts
August 28, 2024
Integrating LLMs and Software Engineering for Self-Refining Copy Creation
Discover how to build an AI agent that generates marketing copy efficiently…
August 28, 2024
ReadAgent: Bringing Gist Memory to AI
Learn how gist memory improves long context handling for large language models.…