Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
davidberenstein1957 
posted an update Jul 27
Post
2397
⚗️ Find reusable synthetic data pipeline code and corresponding datasets on the @huggingface Hub.

Find your pipline and use $ distilabel pipeline run --config "hugging_face_dataset_url/pipeline.yaml"

Some components I used
- Embedded dataset viewer https://huggingface.co/docs/hub/main/en/datasets-viewer-embed
- Hugging Face fsspec https://huggingface.co/docs/huggingface_hub/main/en/guides/hf_file_system
- distilabel https://distilabel.argilla.io/latest/
- Gradio leaderboard by Freddy Boulton freddyaboulton/gradio_leaderboard
- Gradio modal by Ali Abid

Space: davidberenstein1957/distilabel-synthetic-data-pipeline-explorer
This comment has been hidden