The first, OpenCerebrum SFT, is a text-generation and question-answering dataset with ~1.2M examples, curated from sources like Open-Orca, glaiveai, camel-ai, and more! π
The second, OpenCerebrum DPO, is a smaller dataset with ~21k examples, focusing on data point optimization. It's curated from sources like jondurbin, argilla, grimulkan, and others. π
Both datasets are licensed under Apache-2.0 and are available in English. They're ready for use in your projects, and I welcome any feedback for future improvements! π
Locutusque/OpenCerebrum-dpo
Locutusque/OpenCerebrum-SFT
Locutusque/OpenCerebrum-1.0-7b-SFT
Locutusque/OpenCerebrum-1.0-7b-DPO