Jellywibble's picture
Update README.md
5c1264b
|
raw
history blame
1.65 kB
---
tags:
- text-generation
library_name: transformers
widget:
- text: "Is this review positive or negative? Review: Best cast iron skillet you will ever buy."
example_title: "Sentiment analysis"
- text: "Barack Obama nominated Hilary Clinton as his secretary of state on Monday. He chose her because she had ..."
example_title: "Coreference resolution"
- text: "On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book ..."
example_title: "Logic puzzles"
- text: "The two men running to become New York City's next mayor will face off in their first debate Wednesday night ..."
example_title: "Reading comprehension"
---
## Model Description
Pre-training on cleaned version of Principles
- removing numeric references to footnotes
- removing numeric counts, i.e. 1) ... 2) ... 3) ...
- correcting gramma, i.e. full stops must be followed by a space
- finetuning OPT-30B model on the dataset above
- Dataset location: Jellywibble/dalio-principles-cleaned-v3
## Metrics
- Checkpoint 8 served
- Hellaswag Perplexity: 30.65
- 2.289 eval loss
wandb link: https://wandb.ai/jellywibble/huggingface/runs/2jqc504o?workspace=user-jellywibble
## Model Parameters
Trained on 4xA40, effective batchsize = 8
- base_model_name facebook/opt-30b
- dataset_name Jellywibble/dalio-principles-cleaned-v3
- block_size 1024
- gradient_accumulation_steps 2
- per_device_train_batch_size 1
- seed 2
- num_train_epochs 1
- learning_rate 3e-6
## Notes
- It is important for the effective batch size to be at least 8
- Learning rate higher than 3e-6 will result in massive overfitting, i.e. much worse Hellaswag metrics