Jellywibble
/

dalio-pretrain-cleaned-v4

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Jellywibble commited on Nov 21, 2022

Commit

e57d038

·

1 Parent(s): 71652a2

Create README.md

Files changed (1) hide show

README.md +35 -0

README.md ADDED Viewed

	@@ -0,0 +1,35 @@

+---
+tags:
+- text-generation
+library_name: transformers
+---
+## Model Description
+Pre-training on cleaned version of Principles
+- removing numeric references to footnotes
+- removing numeric counts, i.e. 1) ... 2) ... 3) ...
+- correcting gramma, i.e. full stops must be followed by a space
+- finetuning OPT-30B model on the dataset above
+- Dataset location: Jellywibble/dalio-principles-cleaned-v3
+## Metrics
+- Checkpoint 8 served
+- Hellaswag Perplexity: 30.65
+- 2.289 eval loss
+wandb link: https://wandb.ai/jellywibble/huggingface/runs/2jqc504o?workspace=user-jellywibble
+## Model Parameters
+Trained on 4xA40, effective batchsize = 8
+- base_model_name facebook/opt-30b
+- dataset_name Jellywibble/dalio-principles-cleaned-v3
+- block_size 1024
+- gradient_accumulation_steps 2
+- per_device_train_batch_size 1
+- seed 2
+- num_train_epochs 1
+- learning_rate 3e-6
+## Notes
+- It is important for the effective batch size to be at least 8
+- Learning rate higher than 3e-6 will result in massive overfitting, i.e. much worse Hellaswag metrics