--- tags: - text-generation library_name: transformers widget: - text: "Is this review positive or negative? Review: Best cast iron skillet you will ever buy." example_title: "Sentiment analysis" - text: "Barack Obama nominated Hilary Clinton as his secretary of state on Monday. He chose her because she had ..." example_title: "Coreference resolution" - text: "On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book ..." example_title: "Logic puzzles" - text: "The two men running to become New York City's next mayor will face off in their first debate Wednesday night ..." example_title: "Reading comprehension" --- ## Model Description Pre-training on cleaned version of Principles - removing numeric references to footnotes - removing numeric counts, i.e. 1) ... 2) ... 3) ... - correcting gramma, i.e. full stops must be followed by a space - finetuning OPT-30B model on the dataset above - Dataset location: Jellywibble/dalio-principles-cleaned-v3 ## Metrics - Checkpoint 8 served - Hellaswag Perplexity: 30.65 - 2.289 eval loss wandb link: https://wandb.ai/jellywibble/huggingface/runs/2jqc504o?workspace=user-jellywibble ## Model Parameters Trained on 4xA40, effective batchsize = 8 - base_model_name facebook/opt-30b - dataset_name Jellywibble/dalio-principles-cleaned-v3 - block_size 1024 - gradient_accumulation_steps 2 - per_device_train_batch_size 1 - seed 2 - num_train_epochs 1 - learning_rate 3e-6 ## Notes - It is important for the effective batch size to be at least 8 - Learning rate higher than 3e-6 will result in massive overfitting, i.e. much worse Hellaswag metrics