--- title: README emoji: 🐦 colorFrom: pink colorTo: indigo sdk: static pinned: false --- Hi, I am a magpie 🐦! πŸ•ΈοΈ **Project Website**: [https://magpie-align.github.io/](https://magpie-align.github.io/) πŸ“„ **Technical Report**: [https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464) πŸ€— **HF Paper Page**: [https://huggingface.co/papers/2406.08464](https://huggingface.co/papers/2406.08464) 😬 **Codes**: [https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie) πŸ€— **Magpie Demo**: [https://huggingface.co/spaces/davanstrien/magpie](https://huggingface.co/spaces/davanstrien/magpie) (Thanks a lot for the implementation from @davanstrien!) 🐦 **Chat with Magpie**: [https://huggingface.co/spaces/flydust/Chat-with-Magpie](https://huggingface.co/spaces/flydust/Chat-with-Magpie) **Questions?** Please contact [Zhangchen](mailto:zxu9@uw.edu) by email or raise an issue in [Github](https://github.com/magpie-align/magpie/issues/new/choose). ## [🧭 Click here for full dataset navigation (SFT and DPO)](https://github.com/magpie-align/magpie/blob/main/navigation.md) ## Raw Datasets |Model Name | Dataset | Type | Description | |-------------|:-------|:-------|:-------| | [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [Magpie-Llama-3.1-Pro-1M](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-1M-v0.1) | SFT | 1M Raw conversations built with Meta Llama 3.1 70B. | [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-1M](https://huggingface.co/datasets/Magpie-Align/Llama-3-Magpie-Pro-1M-v0.1) | SFT | 1M Raw conversations built with Meta Llama 3 70B. | [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [Magpie-Air-3M](https://huggingface.co/datasets/Magpie-Align/Llama-3-Magpie-Air-3M-v0.1) | SFT | 3M Raw conversations built with Meta Llama 3 8B. | [Qwen2 72B Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) | [Magpie-Qwen2-Pro-1M](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-1M-v0.1) | SFT | 1M Raw conversations built with Qwen2 72B Instruct. | [Qwen2 7B Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) | [Magpie-Qwen2-Air-3M](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Air-3M-v0.1) | SFT | 3M Raw conversations built with Qwen2 7B Instruct. | [Phi-3 Medium Instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) | [Magpie-Phi3-Pro-1M](https://huggingface.co/datasets/Magpie-Align/Magpie-Phi3-Pro-1M-v0.1) | SFT | 1M Raw conversations built with Phi-3 Medium Instruct. | [Gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) | [Magpie-Gemma2-Pro-534K](https://huggingface.co/datasets/Magpie-Align/Magpie-Gemma2-Pro-534K-v0.1) | SFT | 534K conversations built with Gemma-2-27b-it. | [Llama 3.1 405B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) | [Magpie-Ultra-v0.1](https://huggingface.co/datasets/argilla/magpie-ultra-v0.1) | SFT | [Argilla] 50K Raw conversations built with Meta Llama 3.1 405B. ### Recommended Filtered Datasets Here are some filtered datasets made by the authors, which are utilized in our [Magpie-Align models](https://huggingface.co/collections/Magpie-Align/magpie-models-668c4a8eea81ccc0db130bdf). We also encourage you to [create and apply your own filters to customize datasets](https://github.com/magpie-align/magpie?tab=readme-ov-file#4-design-and-apply-your-filter). We've kept these datasets within the 200K-300K range for your convenience. We found this range represents a sweet spot balancing model performance and training time. The full list of filtered datasets can be found [here](https://github.com/magpie-align/magpie/blob/main/navigation.md). |Model Name | Dataset | Type | Description | |-------------|:-------|:-------|:-------| | [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | [Magpie-Llama-3.1-Pro-MT-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered) | SFT | (🌟 Flexible License! 🌟) Select 300K high quality multi-turn conversations from Magpie-Llama-3.1-Pro-MT-500K. | [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered) | SFT | Apply a filter and select 300K high quality conversations from Magpie-Pro-1M. | [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | [Magpie-Pro-MT-300K](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-MT-300K-v0.1) | SFT | Select 300K difficult questions from Magpie-Pro-1M and extend to multi-turn conversations. | [Qwen2 72B Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) | [Magpie-Qwen2-Pro-200K-Chinese](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese) | SFT | Apply a filter and select 200K high quality Chinese conversations from Magpie-Qwen2-Pro-1M. | [Gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) | [Magpie-Gemma2-Pro-200K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Gemma2-Pro-200K-Filtered) | SFT | (🌟 Flexible License! 🌟) Apply a filter and select 200K conversations from Magpie-Gemma2-Pro-534K.