|
--- |
|
tags: |
|
- text-generation |
|
library_name: transformers |
|
widget: |
|
- text: "Is this review positive or negative? Review: Best cast iron skillet you will ever buy." |
|
example_title: "Sentiment analysis" |
|
- text: "Barack Obama nominated Hilary Clinton as his secretary of state on Monday. He chose her because she had ..." |
|
example_title: "Coreference resolution" |
|
- text: "On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book ..." |
|
example_title: "Logic puzzles" |
|
- text: "The two men running to become New York City's next mayor will face off in their first debate Wednesday night ..." |
|
example_title: "Reading comprehension" |
|
--- |
|
|
|
## Model Description |
|
Pre-training on cleaned version of Principles |
|
- removing numeric references to footnotes |
|
- removing numeric counts, i.e. 1) ... 2) ... 3) ... |
|
- correcting gramma, i.e. full stops must be followed by a space |
|
- finetuning OPT-30B model on the dataset above |
|
- Dataset location: Jellywibble/dalio-principles-cleaned-v3 |
|
|
|
## Metrics |
|
- Checkpoint 8 served |
|
- Hellaswag Perplexity: 30.65 |
|
- 2.289 eval loss |
|
|
|
wandb link: https://wandb.ai/jellywibble/huggingface/runs/2jqc504o?workspace=user-jellywibble |
|
|
|
## Model Parameters |
|
Trained on 4xA40, effective batchsize = 8 |
|
- base_model_name facebook/opt-30b |
|
- dataset_name Jellywibble/dalio-principles-cleaned-v3 |
|
- block_size 1024 |
|
- gradient_accumulation_steps 2 |
|
- per_device_train_batch_size 1 |
|
- seed 2 |
|
- num_train_epochs 1 |
|
- learning_rate 3e-6 |
|
|
|
## Notes |
|
- It is important for the effective batch size to be at least 8 |
|
- Learning rate higher than 3e-6 will result in massive overfitting, i.e. much worse Hellaswag metrics |