Sara Han Díaz's picture

Sara Han Díaz

sdiazlor

AI & ML interests

Data curation and generation, RLHF, RAG, Prompt Engineering

Recent Activity

updated a collection about 11 hours ago
Utilities
liked a Space about 11 hours ago
JournalistsonHF/ai-toolkit
updated a collection about 11 hours ago
Utilities
View all activity

Organizations

Hugging Face's profile picture Argilla's profile picture Blog-explorers's profile picture Argilla Explorers's profile picture distilabel-internal-testing's profile picture Data Is Better Together's profile picture Hugging Face Discord Community's profile picture argilla-internal-testing's profile picture Argilla Warehouse's profile picture open/ acc's profile picture Data Is Better Together Contributor's profile picture

sdiazlor's activity

view reply

I guess the tag is generated during completion. However, this might depend on the prompt, the max number of tokens, and how the inference is performed.

New activity in argilla/synthetic-data-generator 5 days ago
view reply

When setting up the HF_TOKEN, did you ensure you granted access to the Inference Endpoints?

upvoted an article 6 days ago
view article
Article

Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies

16
view reply

Hi @Socialmediaprophet ! Sorry that I missed this message. I see that it's running now 🙌. There were some days when the Hub was a bit unstable, so that might have been the root cause of the connection error.

view reply

Hi @beketm ! The main reason is that we directly used the generated completions, but it's true that I missed writing the initial tag.

view reply

Hi! @Aristo2333 ! Regarding DeepSeek, we're using the distilled version, which is available through the Serverless Inference API (https://huggingface.co/docs/api-inference/index), which means that further configuration is not required. Below the Provider indicates that it's available.

Screenshot 2025-02-19 at 14.28.14.png

The Llama issue is raised cause you need to go to the original repository and request access. After approval (in general, quite quick) you'll be able to use the model via the Serverless Inference API too.

upvoted an article 12 days ago
view article
Article

1 Billion Classifications

39
view reply

Hi @Socialmediaprophet ! As shown in the image, you can add environment variables to a Space just by clicking on the button 'New variable'. Then, you can set the name 'MODEL_COMPLETION' and the value deepseek-ai/DeepSeek-R1-Distill-Qwen-32B. This way, the Llama model will generate the prompts, while the DeepSeek model will generate the reasoned completion. Here is the documentation: https://huggingface.co/docs/hub/en/spaces-overview#managing-secrets.

In general, we work with environment variables to configure the application. If you have doubts about the meaning of each of them, you can check it here: https://github.com/argilla-io/synthetic-data-generator/blob/main/README.md#environment-variables

view reply

Hi @shymkovic ! This tutorial aims to highlight the process so that you can replicate it with your configuration or desired LLMs. There is no special reason to use the distilled version other than the fact that it is available through the Serverless Inference API, so everyone could test it.