Model Card for radm/Llama-3-70B-Instruct-AH-lora

This is a LORA adapter for NousResearch/Meta-Llama-3-70B-Instruct, fine-tuned to be a judge on Arena Hard (https://github.com/lm-sys/arena-hard-auto)

Model Details

Model Description

  • Developed by: [radm]
  • Model type: [Llama-3-70b]
  • Language(s) (NLP): [English]
  • License: [apache-2.0]
  • Finetuned from model [optional]: [NousResearch/Meta-Llama-3-70B-Instruct]

Uses

Use repository (https://github.com/r4dm/arena-hard-local) for evaluate with local judge model.

Results

Llama-3-70B-Instruct-GPTQ as judge:

Llama-3-Instruct-8B-SimPO                          | score: 78.3  | 95% CI:   (-1.5, 1.2)   | average #tokens: 545
SELM-Llama-3-8B-Instruct-iter-3                    | score: 72.8  | 95% CI:   (-2.1, 1.4)   | average #tokens: 606
Meta-Llama-3-8B-Instruct-f16                       | score: 65.3  | 95% CI:   (-1.8, 2.1)   | average #tokens: 560
suzume-llama-3-8B-multilingual-orpo-borda-half     | score: 63.5  | 95% CI:   (-1.6, 2.1)   | average #tokens: 978
Phi-3-medium-128k-instruct                         | score: 50.0  | 95% CI:   (0.0, 0.0)    | average #tokens: 801
suzume-llama-3-8B-multilingual                     | score: 48.1  | 95% CI:   (-2.2, 1.8)   | average #tokens: 767
aya-23-8B                                          | score: 48.0  | 95% CI:   (-2.0, 2.1)   | average #tokens: 834
Vikhr-7B-instruct_0.5                              | score: 19.6  | 95% CI:   (-1.3, 1.5)   | average #tokens: 794
alpindale_gemma-2b-it                              | score: 11.2  | 95% CI:   (-1.0, 0.8)   | average #tokens: 425

Llama-3-70B-Instruct-AH-AWQ as judge:

Llama-3-Instruct-8B-SimPO                          | score: 83.8  | 95% CI:   (-1.4, 1.3)   | average #tokens: 545
SELM-Llama-3-8B-Instruct-iter-3                    | score: 78.8  | 95% CI:   (-1.7, 1.9)   | average #tokens: 606
suzume-llama-3-8B-multilingual-orpo-borda-half     | score: 71.8  | 95% CI:   (-1.7, 2.4)   | average #tokens: 978
Meta-Llama-3-8B-Instruct-f16                       | score: 69.8  | 95% CI:   (-1.9, 1.7)   | average #tokens: 560
suzume-llama-3-8B-multilingual                     | score: 54.0  | 95% CI:   (-2.1, 2.1)   | average #tokens: 767
aya-23-8B                                          | score: 50.4  | 95% CI:   (-1.7, 1.7)   | average #tokens: 834
Phi-3-medium-128k-instruct                         | score: 50.0  | 95% CI:   (0.0, 0.0)    | average #tokens: 801
Vikhr-7B-instruct_0.5                              | score: 14.2  | 95% CI:   (-1.3, 1.0)   | average #tokens: 794
alpindale_gemma-2b-it                              | score:  7.9  | 95% CI:   (-0.9, 0.8)   | average #tokens: 425

Training Details

Training Data

Datasets:

  • radm/arenahard_gpt4vsllama3
  • radm/truthy-dpo-v0.1-ru
  • jondurbin/truthy-dpo-v0.1

Training Hyperparameters

  • Training regime: [bf16]
  • Load in 4 bit: [True]
  • Target modules: [all]
  • LoRA rank: [16]
  • Max seq length: [8192]
  • Use gradient checkpointing: [unsloth]
  • trainer: [ORPOTrainer]
  • Batch size: [1]
  • Gradient accumulation steps: [4]
  • Epochs: [1]

Hardware

  • Hardware Type: [Nvidia A100 80 gb]
  • Hours used: [11 hours]

Framework versions

  • PEFT 0.10.0
Downloads last month
23
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for radm/Llama-3-70B-Instruct-AH-lora

Adapter
(1)
this model