14 24 57

Sara Han Díaz

sdiazlor

AI & ML interests

Data curation and generation, RLHF, RAG, Prompt Engineering

Recent Activity

updated a collection about 11 hours ago

Utilities

liked a Space about 11 hours ago

JournalistsonHF/ai-toolkit

updated a collection about 11 hours ago

Utilities

View all activity

Organizations

sdiazlor's activity

updated a collection about 11 hours ago

Utilities

Collection

4 items • Updated about 11 hours ago

liked a Space about 11 hours ago

The Essential AI Toolkit

🧰

A curated collection of AI tools for journalists & creators

updated a collection about 11 hours ago

Utilities

Collection

4 items • Updated about 11 hours ago

liked a Space about 11 hours ago

110

AI Deadlines

⚡

Generate project deadlines

liked a dataset about 11 hours ago

SynthLabsAI/Big-Math-RL-Verified

Viewer • Updated 35 minutes ago • 251k • 122 • 36

commented on Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset 1 day ago

I guess the tag is generated during completion. However, this might depend on the prompt, the max number of tokens, and how the inference is performed.

New activity in argilla/synthetic-data-generator 5 days ago

Error when duplicating space

#19 opened 13 days ago by

alltalentz

commented on Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset 5 days ago

Hi @3rica ! Did you check the examples folder, for instance, this one?: https://github.com/argilla-io/synthetic-data-generator/blob/main/examples/ollama-different-model-for-completion.py. Let me know if you still have issues and think that a more explanatory document would make sense.

commented on Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset 5 days ago

When setting up the HF_TOKEN, did you ensure you granted access to the Inference Endpoints?

upvoted an article 6 days ago

Article

Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies

•

9 days ago

• 16

commented on Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset 6 days ago

Hi @Socialmediaprophet ! Sorry that I missed this message. I see that it's running now 🙌. There were some days when the Hub was a bit unstable, so that might have been the root cause of the connection error.

commented on Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset 6 days ago

Hi @beketm ! The main reason is that we directly used the generated completions, but it's true that I missed writing the initial tag.

commented on Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset 6 days ago

Hi! @Aristo2333 ! Regarding DeepSeek, we're using the distilled version, which is available through the Serverless Inference API (https://huggingface.co/docs/api-inference/index), which means that further configuration is not required. Below the Provider indicates that it's available.

The Llama issue is raised cause you need to go to the original repository and request access. After approval (in general, quite quick) you'll be able to use the model via the Serverless Inference API too.

liked a Space 12 days ago

— Hub API Playground —

🕹

Try the Hugging Face API through the playground

upvoted an article 12 days ago

Article

1 Billion Classifications

13 days ago

• 39

commented on Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset 13 days ago

Hi @Socialmediaprophet ! As shown in the image, you can add environment variables to a Space just by clicking on the button 'New variable'. Then, you can set the name 'MODEL_COMPLETION' and the value deepseek-ai/DeepSeek-R1-Distill-Qwen-32B. This way, the Llama model will generate the prompts, while the DeepSeek model will generate the reasoned completion. Here is the documentation: https://huggingface.co/docs/hub/en/spaces-overview#managing-secrets.

In general, we work with environment variables to configure the application. If you have doubts about the meaning of each of them, you can check it here: https://github.com/argilla-io/synthetic-data-generator/blob/main/README.md#environment-variables

commented on Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset 14 days ago

Hi @shymkovic ! This tutorial aims to highlight the process so that you can replicate it with your configuration or desired LLMs. There is no special reason to use the distilled version other than the fact that it is available through the Serverless Inference API, so everyone could test it.

New activity in sdiazlor/civil-human-rights-question-answering 14 days ago

Librarian Bot: Add language metadata for dataset

#2 opened about 1 month ago by

librarian-bot

New activity in sdiazlor/rag-human-rights-from-prompt 14 days ago

Librarian Bot: Add language metadata for dataset

#1 opened about 1 month ago by

librarian-bot

updated a collection 14 days ago

Utilities

Collection

4 items • Updated about 11 hours ago