tomg-group-umd
/

huginn-0125

Text Generation

Model card Files Files and versions Community

JonasGeiping commited on 10 days ago

Commit

86b1824

·

verified ·

1 Parent(s): cd72c64

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -91,11 +91,11 @@ pipeline_tag: text-generation
 ---
 # Huginn-0125
-This is Huginn, version 01/25. This is a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
 All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach."
 8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
-this model is publicly available (entirely on Hugging Face), and scripts provided with the pretraining code at https://github.com/seal-rg/recurrent-pretraining can be used to repeat our preprocessing.
 ##  Table of Contents

 ---
 # Huginn-0125
+This is Huginn, version 01/25. This is a latent recurrent-depth model with 3.5B parameters, trained for 800B tokens on AMD MI250X machines. This is a proof-of-concept model, but surprisingly capable in reasoning and code given its training budget and size.
 All details on this model can be found in the tech report: "Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach."
 8 intermediate checkpoints of the model can be found in its collection. Additional intermediate checkpoints are available upon request while we find a place to host all ~350 of them. The data used to train
+this model is publicly available (entirely on Hugging Face), and scripts provided with the pretraining code at https://github.com/seal-rg/recurrent-pretraining can be used to repeat our preprocessing and our entire training run.
 ##  Table of Contents