Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
posted an update Dec 27, 2023

Also thanks to @osanseviero for granting me access for the posts private beta! 🦙

Very cool! So this is DPO^2 🔥


Yes, indeed for anyone wondering, a bit more into detail, Mixtral-8x7B-Instruct-v0.1 was fine-tuned using SFT + DPO (read more about it at and we ran the DPO fine-tuning on top of it, but using data from UltraFeedback, in particular argilla/ultrafeedback-binarized-preferences-cleaned, but using a different binarization approach and applying some data cleaning, but essentially following the same approach as @HuggingFaceH4 did with zephyr-7b-beta.

So DPO ^ 2 is fair! 😄

spaces demo?


We may need to discuss that internally, but could be something we may consider for opening 2024 🤗

Very cool, thanks for sharing.

How does it perform on MT-Bench / Alpaca-eval?


MT-Bench is on par with mistralai/Mixtral-8x7B-Instruct-v0.1 which means ~8.3, and we didn't run AlpacaEval yet

This is great! Unfortunately it does not work on Inference Endpoints, will have to try and find some other way to host the model.


Hey sorry to hear that, is it due to the resources required to run it right? Maybe you can ask HuggingFace for a quota increase so that you can allocate the required resources 🤗 Let me know if there's anything I can help you with!

This comment has been hidden