checkpoint @ 31,404,851,200 tokens trained, settings used:

batch_size: 512
context_length: 1024
learning_rate: 2e-4
schedule: cosine with 10% warmup from 0 to 2e-4, cooldown to 0

requirements:

pytorch
transformers
einops

Inference Examples

Inference API (serverless) does not yet support model repos that contain custom code.

crumb
/

32M-32GT-SlimPajama

Dataset used to train crumb/32M-32GT-SlimPajama