File size: 4,480 Bytes
c2f4b94
 
de605d5
c2f4b94
 
 
 
 
 
99f29b1
acf6dc7
4f82c07
acf6dc7
99f29b1
55c1369
4f82c07
 
99f29b1
 
4f82c07
99f29b1
de605d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cc9e550
 
de605d5
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
title: README
emoji: 🐦
colorFrom: pink
colorTo: indigo
sdk: static
pinned: false
---

Hi, I am a magpie 🐦!

πŸ•ΈοΈ **Project Website**: [https://magpie-align.github.io/](https://magpie-align.github.io/)

πŸ“„ **Technical Report**: [https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464)

πŸ€— **HF Paper Page**: [https://huggingface.co/papers/2406.08464](https://huggingface.co/papers/2406.08464)

😬 **Codes**: [https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie)

You can try the Magpie demo [πŸ€— here](https://huggingface.co/spaces/davanstrien/magpie) to generate instruction-response pairs. Thanks a lot for the quick implementation from @davanstrien!

## Dataset Navigation 🧭
### [**Meta Llama 3**](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6)
|Model Name | Dataset | Type | Description |
|-------------|:-------|:-------|:-------|
| [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-1M](https://huggingface.co/datasets/Magpie-Align/Llama-3-Magpie-Pro-1M-v0.1) | SFT | 1M Raw conversations built with Meta Llama 3 70B.
| [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered) | SFT | Apply a filter and select 300K high quality conversations.
| [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-MT-300K](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-MT-300K-v0.1) | SFT | Select 300K difficult questions and extend to multi-turn conversations.
| [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-3M](https://huggingface.co/datasets/Magpie-Align/Llama-3-Magpie-Air-3M-v0.1) | SFT | 3M Raw conversations built with Meta Llama 3 8B.
| [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-300K-Filtered) | SFT | Apply a filter and select 300K high quality data.
| [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-MT-300K](https://huggingface.co/datasets/Magpie-Align/Magpie-Air-MT-300K-v0.1) | SFT | Select 300K difficult questions and extend to multi-turn conversations.

### [**Qwen2**](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f)
|Model Name | Dataset | Type | Description |
|-------------|:-------|:-------|:-------|
| [Qwen2 72B Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) | [Magpie-Qwen2-Pro-1M](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-1M-v0.1) | SFT | 1M Raw conversations built with Qwen2 72B Instruct.
| [Qwen2 72B Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) | [Magpie-Qwen2-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered) | SFT | Apply a filter and select 300K high quality conversations.
| [Qwen2 72B Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) | [Magpie-Qwen2-Pro-200K-Chinese](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese) | SFT | Apply a filter and select 200K high quality Chinese conversations.
| [Qwen2 7B Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) | [Magpie-Qwen2-Air-3M](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Air-3M-v0.1) | SFT | 3M Raw conversations built with Qwen2 7B Instruct.
| [Qwen2 7B Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) | [Magpie-Qwen2-Air-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen-Air-300K-Filtered) | SFT | Apply a filter and select 300K high quality conversations.

### [**Phi-3**](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3)
|Model Name | Dataset | Type | Description |
|-------------|:-------|:-------|:-------|
| [Phi-3 Medium Instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) | [Magpie-Phi3-Pro-1M](https://huggingface.co/datasets/Magpie-Align/Magpie-Phi3-Pro-1M-v0.1) | SFT | 1M Raw conversations built with Phi-3 Medium Instruct.
| [Phi-3 Medium Instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) | [Magpie-Phi3-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Phi3-Pro-300K-Filtered) | SFT | Apply a filter and select 300K high quality conversations.