--- license: other datasets: - garage-bAInd/Open-Platypus license_name: yi-license license_link: LICENSE model-index: - name: platypus-yi-34b results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 68.43 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 85.21 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 78.13 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 54.48 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 84.06 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 59.82 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b name: Open LLM Leaderboard --- # Instruction tune of Yi-34b with Open-Platypus (fp16) ## Overview This is [chargoddard/Yi-34B-Llama](https://huggingface.co/chargoddard/Yi-34B-Llama), with instruction tuning performed with the [garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) dataset. That base model is [01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B), but using llama2 model definitions and tokenizer to remove any remote code requirements. **This is a (merged) QLoRA fine-tune (rank 64)**. The finetune was performed with 1x RTX 6000 Ada (~18 hours to this checkpoint). It is possible this is rather undertrained, as this checkpoint is at 1 epoch. I began to see some performance degradation after that; more hyperparameter tuning is probably warranted. ## How to Use Use as you would any llama-2 model. ## Prompting: Model was trained with legacy airoboros <2.0 system prompt. See [bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16](https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16) model card for details. # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_bhenrym14__platypus-yi-34b) | Metric |Value| |---------------------------------|----:| |Avg. |71.69| |AI2 Reasoning Challenge (25-Shot)|68.43| |HellaSwag (10-Shot) |85.21| |MMLU (5-Shot) |78.13| |TruthfulQA (0-shot) |54.48| |Winogrande (5-shot) |84.06| |GSM8k (5-shot) |59.82|