MFANN3b / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
b5964c3 verified
|
raw
history blame
4.44 kB
metadata
license: apache-2.0
library_name: adapter-transformers
datasets:
  - netcat420/MFANN
model-index:
  - name: MFANN3b
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 43.09
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=netcat420/MFANN3b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 72.33
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=netcat420/MFANN3b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 26.74
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=netcat420/MFANN3b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 40.22
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=netcat420/MFANN3b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 62.67
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=netcat420/MFANN3b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 3.34
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=netcat420/MFANN3b
          name: Open LLM Leaderboard

Fine-tuned on an expansive database comprising over 2.5 million tokens meticulously structured for chain of thought reasoning, MFANN emerges as a powerhouse in understanding and generating coherent, contextually rich text. Its robust architecture enables it to seamlessly navigate complex linguistic nuances, adeptly chaining together ideas to produce fluid, human-like discourse.

MFANN's exceptional capacity for reasoning shines through in its ability to grasp intricate concepts and synthesize information cohesively. Whether tackling intricate philosophical debates, crafting compelling narratives, or generating insightful analyses, MFANN consistently delivers results that are both insightful and compelling.

Empowered by its massive parameter count and rigorous fine-tuning process, MFANN excels in a myriad of applications, from natural language understanding and generation to content creation and dialogue systems. Its versatility and proficiency make it an invaluable tool across various domains, empowering researchers, developers, and innovators alike to push the boundaries of what's possible in artificial intelligence.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 41.40
AI2 Reasoning Challenge (25-Shot) 43.09
HellaSwag (10-Shot) 72.33
MMLU (5-Shot) 26.74
TruthfulQA (0-shot) 40.22
Winogrande (5-shot) 62.67
GSM8k (5-shot) 3.34