Llama 3.1 8B Experimental 1206

Overall Strengths

  1. Logical and Boolean Reasoning โ€“ Excels in tasks requiring clear, rule-based logic and manipulation of true/false statements.
  2. Focused Domain Knowledge โ€“ Strong at certain specialized tasks (sports rules, ruin names, hyperbaton) that blend world knowledge with language comprehension.
  3. Good Instruction Compliance โ€“ High prompt-level and instance-level accuracy (both strict and loose) indicate that it follows user instructions effectively, even in more complex or nuanced prompts.
  4. Reasonable Multi-step Reasoning โ€“ While not the best in every logic category, it still shows solid performance (60%+) on tasks like disambiguation and causal reasoning.
  5. Extended Context Window (138k) โ€“ The large 138k token context allows the model to handle lengthy inputs and maintain coherence across extensive passages or multi-turn conversations. This is especially valuable for tasks like long-document question answering, summarization, or complex scenario analysis where context retention is crucial.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 25.67
IFEval (0-Shot) 69.67
BBH (3-Shot) 30.06
MATH Lvl 5 (4-Shot) 11.10
GPQA (0-shot) 6.60
MuSR (0-shot) 8.50
MMLU-PRO (5-shot) 28.10
Downloads last month
39
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for sethuiyer/Llama-3.1-8B-Experimental-1206-Instruct

Finetuned
(138)
this model
Merges
1 model
Quantizations
1 model

Evaluation results