Thirty-Eighth Annual Conference on Neural Information Processing Systems
December 10 – 15, Vancouver Convention Center
BOOTH #93
Poster
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
GAVIA GRAY · AMAN TIWAR · SHANE BERGSMA · JOEL HESTNESS
Per-example gradient norms are a vital ingredient for estimating gradient noise scale with minimal variance. Observing the tensor contractions required to compute them, we propose a method with minimal FLOPs in 3D or greater tensor regimes by simultaneously computing …
Poster
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
NOLAN SIMRAN DEY · SHANE BERGSMA · JOEL HESTNESS
Several challenges make it difficult for sparse neural networks to compete with dense models. First, setting a large fraction of weights to zero impairs forward and gradient signal propagation. Second, sparse studies often need to test multiple sparsity levels, while also introducing new hyperparameters (HPs) …
Poster
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
VITHURSAN THANGARASA · GANESH VENKATESH · MIKE LASBY · NISH SINNADURAI · SEAN LIE
Large language models have driven significant progress in natural language processing, but their deployment requires substantial compute and memory resources. As models scale, compression techniques become essential for balancing model quality with computational efficiency. Structured pruning …
Workshop
AI4Mat-2024: NeurIPS 2024 Workshop on AI for Accelerated Materials Design
NeurIPS Navigator
Using Cerebras Inference,we built a web app to help people find interesting papers at NeurIPS 2024. Use the app to find papers you’re interested in and ask questions about the papers directly with the help of LLMs.
Presenters at NeurIPS 2024
Events
Blog
Introducing Sparse Llama: 70% Smaller, 3x Faster, Full Accuracy
Cerebras and Neural Magic have achieved a major milestone in the field of large language models (LLMs). By combining state-of-the-art pruning techniques, sparse pretraining, and purpose-built hardware, we have unlocked unprecedented levels of sparsity in LLMs, enabling up to 70% parameter reduction without compromising accuracy.
Blog
Cerebras Breaks Exascale Record for Molecular Dynamics Simulations
Cerebras has set a new record for molecular dynamics simulation speed that goes far beyond the exascale level. While this breakthrough has wide-ranging impacts for materials modeling, we initially focused on a problem relevant to commercializing nuclear fusion. This achievement demonstrates how Cerebras's wafer-scale computers enable novel computational science applications.
Blog
Cerebras CS-3 vs. Nvidia B200: 2024 AI Accelerators Compared
In the fast-paced world of AI hardware, the Cerebras CS-3 and Nvidia DGX B200 are two of the most exciting new offerings to hit the market in 2024. Both systems are designed to tackle large scale AI training, but they take decidedly different approaches.