Jellywibble commited on
Commit
e57d038
·
1 Parent(s): 71652a2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-generation
4
+ library_name: transformers
5
+ ---
6
+
7
+ ## Model Description
8
+ Pre-training on cleaned version of Principles
9
+ - removing numeric references to footnotes
10
+ - removing numeric counts, i.e. 1) ... 2) ... 3) ...
11
+ - correcting gramma, i.e. full stops must be followed by a space
12
+ - finetuning OPT-30B model on the dataset above
13
+ - Dataset location: Jellywibble/dalio-principles-cleaned-v3
14
+
15
+ ## Metrics
16
+ - Checkpoint 8 served
17
+ - Hellaswag Perplexity: 30.65
18
+ - 2.289 eval loss
19
+
20
+ wandb link: https://wandb.ai/jellywibble/huggingface/runs/2jqc504o?workspace=user-jellywibble
21
+
22
+ ## Model Parameters
23
+ Trained on 4xA40, effective batchsize = 8
24
+ - base_model_name facebook/opt-30b
25
+ - dataset_name Jellywibble/dalio-principles-cleaned-v3
26
+ - block_size 1024
27
+ - gradient_accumulation_steps 2
28
+ - per_device_train_batch_size 1
29
+ - seed 2
30
+ - num_train_epochs 1
31
+ - learning_rate 3e-6
32
+
33
+ ## Notes
34
+ - It is important for the effective batch size to be at least 8
35
+ - Learning rate higher than 3e-6 will result in massive overfitting, i.e. much worse Hellaswag metrics