---
license: other
datasets:
- garage-bAInd/Open-Platypus
license_name: yi-license
license_link: LICENSE
model-index:
- name: platypus-yi-34b
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 68.43
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 85.21
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 78.13
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 54.48
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 84.06
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 59.82
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=bhenrym14/platypus-yi-34b
      name: Open LLM Leaderboard
---

# Instruction tune of Yi-34b with Open-Platypus (fp16)


## Overview

This is [chargoddard/Yi-34B-Llama](https://huggingface.co/chargoddard/Yi-34B-Llama), with instruction tuning performed with the [garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) dataset. That base model is [01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B), but using llama2 model definitions and tokenizer to remove any remote code requirements.

**This is a (merged) QLoRA fine-tune (rank 64)**. 

The finetune was performed with 1x RTX 6000 Ada (~18 hours to this checkpoint). It is possible this is rather undertrained, as this checkpoint is at 1 epoch. I began to see some performance degradation after that; more hyperparameter tuning is probably warranted.

## How to Use

Use as you would any llama-2 model.

## Prompting:

Model was trained with legacy airoboros <2.0 system prompt. See [bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16](https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16) model card for details.

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_bhenrym14__platypus-yi-34b)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |71.69|
|AI2 Reasoning Challenge (25-Shot)|68.43|
|HellaSwag (10-Shot)              |85.21|
|MMLU (5-Shot)                    |78.13|
|TruthfulQA (0-shot)              |54.48|
|Winogrande (5-shot)              |84.06|
|GSM8k (5-shot)                   |59.82|