README.md · Jellywibble/dalio-pretrain-cleaned-v4 at 5c1264bdbfc8e839423afc85fb0354ba69d569eb

metadata

tags:
  - text-generation
library_name: transformers
widget:
  - text: >-
      Is this review positive or negative? Review: Best cast iron skillet you
      will ever buy.
    example_title: Sentiment analysis
  - text: >-
      Barack Obama nominated Hilary Clinton as his secretary of state on Monday.
      He chose her because she had ...
    example_title: Coreference resolution
  - text: >-
      On a shelf, there are five books: a gray book, a red book, a purple book,
      a blue book, and a black book ...
    example_title: Logic puzzles
  - text: >-
      The two men running to become New York City's next mayor will face off in
      their first debate Wednesday night ...
    example_title: Reading comprehension

Model Description

Pre-training on cleaned version of Principles

removing numeric references to footnotes
removing numeric counts, i.e. 1) ... 2) ... 3) ...
correcting gramma, i.e. full stops must be followed by a space
finetuning OPT-30B model on the dataset above
Dataset location: Jellywibble/dalio-principles-cleaned-v3

Metrics

Checkpoint 8 served
Hellaswag Perplexity: 30.65
2.289 eval loss

wandb link: https://wandb.ai/jellywibble/huggingface/runs/2jqc504o?workspace=user-jellywibble

Model Parameters

Trained on 4xA40, effective batchsize = 8

base_model_name facebook/opt-30b
dataset_name Jellywibble/dalio-principles-cleaned-v3
block_size 1024
gradient_accumulation_steps 2
per_device_train_batch_size 1
seed 2
num_train_epochs 1
learning_rate 3e-6

Notes

It is important for the effective batch size to be at least 8
Learning rate higher than 3e-6 will result in massive overfitting, i.e. much worse Hellaswag metrics