June 9, 2023
SlimPajama: A 627B token, cleaned and deduplicated version of RedPajama
Today we are releasing SlimPajama – the largest deduplicated, multi-corpora, open-source, dataset for training large…
0 Comments18 Minutes
January 30, 2023
To Bfloat or not to Bfloat? That is the Question!
The bfloat16 data format for deep learning shortens training time, while preserving accuracy level.
0 Comments8 Minutes