Recommend a dataset in the scientific domain made by us: EricLu/SCP-116K

#2
by EricLu - opened

https://huggingface.co/datasets/EricLu/SCP-116K

SCP-116K is a large-scale dataset containing 116,756 high-quality scientific problem-solution pairs, automatically extracted from web crawled documents. The dataset covers multiple scientific disciplines including physics, chemistry, and biology, targeting undergraduate to doctoral-level content. Each problem is accompanied by its matched solution, as well as solutions generated by advanced language models (o1-mini and QwQ-32B-preview) along with validation flags.

Unfortunately, due to some technical reasons, we have not provided R1 distillation data, and we will solve this problem as soon as possible in subsequent iterations

Open R1 org

That's very cool! thanks for creating this dataset :)

Please also have a look at this dataset: https://agi.safe.ai/

Sign up or log in to comment