pszemraj commited on
Commit
f4e695e
·
verified ·
1 Parent(s): 2cf6678

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -18
README.md CHANGED
@@ -5,34 +5,27 @@ tags:
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
8
- model-index:
9
- - name: griffin-v0.01-c3t-8layer-simplewiki-silu-fineweb-1M_en-med-vN
10
- results: []
 
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- # griffin-v0.01-c3t-8layer-simplewiki-silu-fineweb-1M_en-med-vN
17
 
18
- This model is a fine-tuned version of [pszemraj/griffin-v0.01-c3t-8layer-simplewiki-silu](https://huggingface.co/pszemraj/griffin-v0.01-c3t-8layer-simplewiki-silu) on the BEE-spoke-data/fineweb-1M_en-med dataset.
 
 
 
 
19
  It achieves the following results on the evaluation set:
20
  - Loss: 5.1888
21
  - Accuracy: 0.2326
22
  - Num Input Tokens Seen: 798621696
23
 
24
- ## Model description
25
-
26
- More information needed
27
-
28
- ## Intended uses & limitations
29
-
30
- More information needed
31
-
32
- ## Training and evaluation data
33
-
34
- More information needed
35
-
36
  ## Training procedure
37
 
38
  ### Training hyperparameters
@@ -75,4 +68,4 @@ The following hyperparameters were used during training:
75
  - Transformers 4.40.1
76
  - Pytorch 2.3.0+cu121
77
  - Datasets 2.19.0
78
- - Tokenizers 0.19.1
 
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
8
+ datasets:
9
+ - BEE-spoke-data/fineweb-1M_en-med
10
+ language:
11
+ - en
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
+ # griffin-c3t-8L-v0.02-fineweb
18
 
19
+ Pretraining experiment with griffin/recurrent_gemma arch
20
+
21
+ ## Model description
22
+
23
+ Further training of [pszemraj/griffin-v0.01-c3t-8layer-simplewiki-silu](https://hf.co/pszemraj/griffin-v0.01-c3t-8layer-simplewiki-silu) on the BEE-spoke-data/fineweb-1M_en-med dataset.
24
  It achieves the following results on the evaluation set:
25
  - Loss: 5.1888
26
  - Accuracy: 0.2326
27
  - Num Input Tokens Seen: 798621696
28
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  ## Training procedure
30
 
31
  ### Training hyperparameters
 
68
  - Transformers 4.40.1
69
  - Pytorch 2.3.0+cu121
70
  - Datasets 2.19.0
71
+ - Tokenizers 0.19.1