Empirical Upper Bounds for Unstructured Sparsity in Compute-Efficient Language Modeling

Large language models have driven significant progress in…


Self-Data Distillation for Recovering Quality in Pruned Large Language Models

Large language models have driven significant progress in…


Bilingual Adaptation of Monolingual Foundation Models

We present an efficient method for adapting a monolingual…


Bilingual Adaptation of Monolingual Foundation Models

We present an efficient method for adapting a monolingual…


Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator Hardware

The recent trend toward deep learning has led to the…


Position Interpolation Improves ALiBi Extrapolation

Linear position interpolation helps pre-trained models…


BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

We introduce the Bittensor Language Model, called…


A Big Chip for Big Science: Watching the COVID-19 Virus in Action

It’s hard to imagine a better example of “AI for good” than…