The Cerebras team is extremely excited to announce the release of version 1.7 of the Cerebras Software Platform, CSoft. In this release, we introduce support for unique, high-resolution computer vision (CV) models, deliver performance improvements for GPT-style models, and enable training with sparse weights. Additionally, we expand our support for the popular machine learning framework PyTorch as well as novel weight-streaming execution mode.
Unique Computer Vision Capabilities
In this release, we introduce unique support for 2D segmentation with UNet. This is unique because we support images up to 5K x 5K pixels without tiling. This is a much larger image size than can be supported out-of-the-box on GPUs today. Objects may not be visible or may be blurred at lower resolutions, reducing the quality of a model and its ability to produce accurate results. Training on higher-resolution images with a Cerebras Wafer-Scale Cluster unlocks the ability to build high-quality models that are able to achieve what was previously impossible. Learn more about what high-resolution computer vision makes possible in this blog post.
Extended PyTorch Large Language Model Support
In this release, we extend the availability of large language models for PyTorch. Now, GPT-J and GPT-3 models up to 20B parameters are supported in PyTorch and can be accessed by navigating our Model Zoo. For the comprehensive list of the GPT models we’ve made available in this release, please see the table below. Additionally, we improved the performance of GPT-style models by up to 1.5x compared to our previous release. This is a major increase in performance that you can benefit from simply by upgrading!
Model | PyTorch | TensorFlow |
---|---|---|
GPT-2 1.5 billion | ✓ | ✓ |
GPT-3 XL 1.3 billion | ✓ | ✓ |
GPT-3 6.7 billion | ✓ | |
GPT-3 13 billion | ✓ | ✓ |
GPT-J 6 billion | ✓ | ✓ |
GPT-3 20 billion | ✓ | ✓ |
GPT-NeoX 20 billion | ✓ | ✓ |
Train Large Models with Sparse Weights
At NeurIPS 2022, we announced that we successfully trained unstructured sparse 1.3 billion parameter GPT-3 models on Cerebras CS-2 systems and demonstrated how these models achieve competitive results at a fraction of the inference FLOPs, with our 83.8% sparse model achieving a 3x reduction in FLOPs at matching performance on the Pile. We also introduced Sparse Pre-training and Dense Fine-tuning to reduce the computational FLOPs of training GPT models using weight sparsity. In this release, we make available the ability to train large language models with sparse weights in PyTorch. Now, you can benefit from the research we presented at NeurIPS as we provide scripts that enable you to convert a dense PyTorch checkpoint into a sparse version. To learn more, please see our blogs below.
Build Amazing Solutions!
We are excited about this release for many reasons. We not only offer support for CV models for the first time, but we offer the ability to train with higher-resolution images than what is possible on out-of-the-box GPUs today. Additionally, we take our research to production and offer the ability to train with sparse weights. Training with sparse weights can significantly reduce training time for multi-billion parameter models, enabling users to iterate, achieve optimal results, and deploy to production faster. Finally, this release further demonstrates our support for PyTorch with the addition of GPT-J and GPT-3 models that perform (throughput) 1.5x better than in our previous release. With new capabilities and better performance, we are excited for you to check out our latest software and build amazing solutions!
To complement this release, we have published the following resources that you can use to quickly try out our new features.
Release Notes
- For a complete list of features released in CSoft v1.7, please refer to the Release Notes
Documentation:
Learning Resources:
- More Pixels, More Context, More Insight!
- Overview of sparsity and how it can be used to reduce cost and time of training and inference
- Results of sparse pre-training followed by dense fine-tuning
- Sparsified GPT-3 via iterative magnitude pruning as “a proof of existence” for sparse large language models with the same quality as dense counterpart
Have Questions?
Connect with our developer community in our Discourse Forum, where you can be helped by system experts on a wide range of topics.
Udai Mody | Product Marketing and Partnerships | February 17, 2023