Commit
·
e57d038
1
Parent(s):
71652a2
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- text-generation
|
4 |
+
library_name: transformers
|
5 |
+
---
|
6 |
+
|
7 |
+
## Model Description
|
8 |
+
Pre-training on cleaned version of Principles
|
9 |
+
- removing numeric references to footnotes
|
10 |
+
- removing numeric counts, i.e. 1) ... 2) ... 3) ...
|
11 |
+
- correcting gramma, i.e. full stops must be followed by a space
|
12 |
+
- finetuning OPT-30B model on the dataset above
|
13 |
+
- Dataset location: Jellywibble/dalio-principles-cleaned-v3
|
14 |
+
|
15 |
+
## Metrics
|
16 |
+
- Checkpoint 8 served
|
17 |
+
- Hellaswag Perplexity: 30.65
|
18 |
+
- 2.289 eval loss
|
19 |
+
|
20 |
+
wandb link: https://wandb.ai/jellywibble/huggingface/runs/2jqc504o?workspace=user-jellywibble
|
21 |
+
|
22 |
+
## Model Parameters
|
23 |
+
Trained on 4xA40, effective batchsize = 8
|
24 |
+
- base_model_name facebook/opt-30b
|
25 |
+
- dataset_name Jellywibble/dalio-principles-cleaned-v3
|
26 |
+
- block_size 1024
|
27 |
+
- gradient_accumulation_steps 2
|
28 |
+
- per_device_train_batch_size 1
|
29 |
+
- seed 2
|
30 |
+
- num_train_epochs 1
|
31 |
+
- learning_rate 3e-6
|
32 |
+
|
33 |
+
## Notes
|
34 |
+
- It is important for the effective batch size to be at least 8
|
35 |
+
- Learning rate higher than 3e-6 will result in massive overfitting, i.e. much worse Hellaswag metrics
|