Sara Han Díaz
AI & ML interests
Recent Activity
Organizations
sdiazlor's activity

I guess the tag is generated during completion. However, this might depend on the prompt, the max number of tokens, and how the inference is performed.
Error when duplicating space

Hi @3rica ! Did you check the examples folder, for instance, this one?: https://github.com/argilla-io/synthetic-data-generator/blob/main/examples/ollama-different-model-for-completion.py. Let me know if you still have issues and think that a more explanatory document would make sense.

When setting up the HF_TOKEN, did you ensure you granted access to the Inference Endpoints?

Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies

Hi @Socialmediaprophet ! Sorry that I missed this message. I see that it's running now 🙌. There were some days when the Hub was a bit unstable, so that might have been the root cause of the connection error.

Hi @beketm ! The main reason is that we directly used the generated completions, but it's true that I missed writing the initial tag.

Hi! @Aristo2333 ! Regarding DeepSeek, we're using the distilled version, which is available through the Serverless Inference API (https://huggingface.co/docs/api-inference/index), which means that further configuration is not required. Below the Provider indicates that it's available.
The Llama issue is raised cause you need to go to the original repository and request access. After approval (in general, quite quick) you'll be able to use the model via the Serverless Inference API too.

1 Billion Classifications

Hi
@Socialmediaprophet
! As shown in the image, you can add environment variables to a Space just by clicking on the button 'New variable'. Then, you can set the name 'MODEL_COMPLETION' and the value deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
. This way, the Llama model will generate the prompts, while the DeepSeek model will generate the reasoned completion. Here is the documentation: https://huggingface.co/docs/hub/en/spaces-overview#managing-secrets.
In general, we work with environment variables to configure the application. If you have doubts about the meaning of each of them, you can check it here: https://github.com/argilla-io/synthetic-data-generator/blob/main/README.md#environment-variables

Hi @shymkovic ! This tutorial aims to highlight the process so that you can replicate it with your configuration or desired LLMs. There is no special reason to use the distilled version other than the fact that it is available through the Serverless Inference API, so everyone could test it.
Librarian Bot: Add language metadata for dataset

Librarian Bot: Add language metadata for dataset
