Spaces:
Running
Recommend a dataset in the scientific domain made by us: EricLu/SCP-116K
https://huggingface.co/datasets/EricLu/SCP-116K
SCP-116K is a large-scale dataset containing 116,756 high-quality scientific problem-solution pairs, automatically extracted from web crawled documents. The dataset covers multiple scientific disciplines including physics, chemistry, and biology, targeting undergraduate to doctoral-level content. Each problem is accompanied by its matched solution, as well as solutions generated by advanced language models (o1-mini and QwQ-32B-preview) along with validation flags.
Unfortunately, due to some technical reasons, we have not provided R1 distillation data, and we will solve this problem as soon as possible in subsequent iterations
That's very cool! thanks for creating this dataset :)
Please also have a look at this dataset: https://agi.safe.ai/