checkpoint @ 31,404,851,200 tokens trained, settings used:
batch_size: 512
context_length: 1024
learning_rate: 2e-4
schedule: cosine with 10% warmup from 0 to 2e-4, cooldown to 0
requirements:
pytorch
transformers
einops
- Downloads last month
- 13
Inference API (serverless) does not yet support model repos that contain custom code.