@dvilasuero on Hugging Face: "👋 Hi there! This is my very first post. I'll use it to share some old…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

dvilasuero

posted an update Jan 6

Post

👋 Hi there!

This is my very first post.

I'll use it to share some old news: a math preference dataset for DPO!

I created this dataset some time ago while we were developing distilabel (https://github.com/argilla-io/distilabel).

Some days ago we found out people are actually using it! So I'll use this post to explain how I built it in case it's useful for the community.

1. I used distilabel's SelfInstruct-inspired task to generate instructions about different math topics. I curated the instructions with Argilla (on Spaces!).
2. Then I used a distilabel Pipeline to build a preference dataset using gpt3.5 as generator and gpt4 as labeller. If I recall correctly I used our JudgeLM implementation (see https://distilabel.argilla.io/latest/technical-reference/tasks/#judgelmtask)

(see the screenshot with the dataset in the Argilla UI)

3. Then I just binarized into chosen, rejected pairs and voilà:

argilla/distilabel-math-preference-dpo

The funny thing is that I used this to do a second DPO run over Notus-7B. I hoped to see an improvement on math/reasoning skills but it actually improved in STEM and Humanities and did worse on Math 🤣 .

In conclusion, this dataset was only a quick experiement. I'm happy to see the community found it useful. Data for DPO and fine-tuning are still a mystery, let's unveil these mysteries in 2024 together!

Follow me for the most exciting datasets for LLMs (and maybe some great, small, efficient models). I plan to announce all Argilla open-source work here!

dvilasuero

Jan 6

If you want to build something similar, here's an end-to-end colab:

https://colab.research.google.com/drive/1rO1-OlLFPBC0KPuXQOeMpZOeajiwNoMy?usp=sharing

Tonic

Jan 7

i love the "posting" from arguilla , what a fantastic way to share 🤗

In this post