Transformers
Safetensors
PEFT
Inference Endpoints
File size: 1,863 Bytes
56c45e1
 
24b2a6b
 
 
630a9a7
24b2a6b
237b5d3
71b51f2
 
56c45e1
 
c9c7ba2
56c45e1
c9c7ba2
56c45e1
c9c7ba2
 
630a9a7
56c45e1
c9c7ba2
56c45e1
c9c7ba2
56c45e1
c9c7ba2
56c45e1
c9c7ba2
 
 
 
56c45e1
c9c7ba2
56c45e1
c9c7ba2
56c45e1
c9c7ba2
56c45e1
c9c7ba2
56c45e1
04a57a6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
library_name: transformers
tags:
- transformers
- peft
- arxiv:2406.08391
license: llama2
base_model: meta-llama/Llama-2-13b-hf
datasets:
- calibration-tuning/Llama-2-13b-hf-20k-choice
---

# Model Card

**Llama 13B CT-Choice** is a fine-tuned [Llama 13B](https://huggingface.co/meta-llama/Llama-2-13b-hf) model that provides well-calibrated confidence estimates for multiple-choice question answering.

The model is fine-tuned (calibration-tuned) using a [dataset](https://huggingface.co/datasets/calibration-tuning/Llama-2-13b-hf-20k-choice) of *multiple-choice* generations from `meta-llama/Llama-2-13b-hf`, labeled for correctness. 
At test/inference time, the probability of correctness defines the confidence of the model in its answer. 
For full details, please see our [paper](https://arxiv.org/abs/2406.08391) and supporting [code](https://github.com/activatedgeek/calibration-tuning).

**Other Models**: We also release a broader collection of [Multiple-Choice CT Models](https://huggingface.co/collections/calibration-tuning/multiple-choice-ct-models-66043dedebf973d639090821).

## Usage

This adapter model is meant to be used on top of `meta-llama/Llama-2-13b-hf` model generations.

The confidence estimation pipeline follows these steps,
1. Load base model and PEFT adapter.
2. Disable adapter and generate answer.
3. Enable adapter and generate confidence.

All standard guidelines for the base model's generation apply.

For a complete example, see [play.py](https://github.com/activatedgeek/calibration-tuning/blob/main/experiments/play.py) at the supporting code repository.

**NOTE**: Using the adapter for generations may hurt downstream task accuracy and confidence estimates. We recommend using the adapter to estimate *only* confidence.

## License

The model is released under the original model's Llama 2 Community License Agreement.