- The GPT -2 model was trained on the BookCorpus dataset for 60K steps.
- No position embedding was used (NoPE).
- Here is the wandb report
- This is for educational purposes only.
- Downloads last month
- 119
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.