“We note that these training runs frequently take >1 week on dedicated GPU resources (such as Polaris@ALCF). To enable training of the larger models on the full sequence length (10,240 tokens), we leveraged AI-hardware accelerators such as Cerebras CS-2, both in a stand-alone mode and as an inter-connected cluster, and obtained GenSLMs that converge in less than a day.”
Award-winning research
2022 Gordon Bell Prize for COVID Research
A team led by researchers from Argonne National Laboratory and Cerebras was recognized for developing the first genome-scale language model to study the evolutionary dynamics of SARS-CoV-2. Their work has the potential to transform how we identify and classify new and emergent variants of pandemic-causing viruses.
At Cerebras Systems, we love it when the CS-2 is vastly faster than large NVIDIA GPU clusters.
Customer Case Study
GlaxoSmithKline: Epigenomic Language Models For Drug Discovery
A team of GSK researchers introduced a BERT model that learns representations based on both DNA sequence and paired epigenetic state inputs, which they named Epigenomic BERT (EBERT).
Training this complex model with a previously prohibitively large dataset was made possible for the first time by the partnership between GSK and Cerebras, empowering the team to train the EBERT model in about 2.5 days, compared to an estimated 24 days using a GPU cluster with 16 nodes.
Testimonial
Kim Branson
SVP Global Head of AI and ML @
GSK
use case
Drug Discovery
Traditional laboratory based drug screening is a slow process, taking years for a compound to progress from research into trials. AI models like Transformers, Graph Neural Networks (GNNs), and Multi-Layer Perceptrons (MLPs) use complex software to screen large libraries of candidate drug molecules; selecting only the most promising ones for subsequent trials. Using computers instead of laboratories enables faster screening and advances the most promising candidate compounds more quickly, dramatically reducing drug development time. Deep neural networks have shown exceptional results, and are now considered among the most powerful computational tools for virtual drug screening.
In partnership with customers, the CS-2 system has demonstrated vast improvements in deep neural network-based virtual screening – in one case reducing screening time for a large library of compounds from 183 days on a GPU cluster to 3.5 days on the CS-2. This 50 X acceleration means reducing time to solution by six months. Faster time to solution reduces time to cure.
use case
Text and Language Modeling
Neural networks like BERT and GPT can model semantic relationships within records, reports, and scientific literature, so you can instantly answer questions using this database of knowledge.
Today, the compute resources and expertise needed to efficiently work with large language models – such as BERT and GPT – and massive real-world text databases are only available in hyperscale datacenters. With a single CS-2, your organization can train models like these in hours or days rather than weeks or months.
use case
Genomics and data science
In genomics, AI has shown great potential for identifying subtle signatures of public health challenges as well as new opportunities for the treatment of rare diseases.
However, most work in this space has been limited to small clinical trials or local populations because the deep learning models used to classify sequences or predict phenotype — e.g. RNNs, Transformers, 1D CNNs — take too long to train or process with large, sparse datasets on GPU. Use the CS-2 to bring 100x – 1,000x more data to your models.