Skywork-Reward-Data-Collection Collection Open-source preference datasets used to train the Skywork reward model series • 17 items • Updated Oct 12 • 11
MagpieLM Collection Aligning LMs with Fully Open Recipe (data+training configs+logs) • 9 items • Updated Sep 22 • 15
Direct Preference Optimization Datasets Collection Datasets suitable for Direct Preference Optimization based on their colum names • 1597 items • Updated Jul 10 • 2
Probably DPO datasets Collection A collection of datasets that probably support DPO • 146 items • Updated Jun 26 • 12
Datasets built with ⚗️ distilabel Collection This collection contains some datasets generated and/or labelled using https://github.com/argilla-io/distilabel • 8 items • Updated 14 days ago • 11
Translated (En->Ko) dataset Collection Datasets translated from English to Korean using llama3-instrucTrans-enko-8b • 19 items • Updated Nov 23 • 3
Synthetic (text) Dataset Generation Collection Papers about synthetic dataset generation • 9 items • Updated Jun 21 • 8
synthetic-data-generation-demos Collection A collection of demos for various approaches to synthetic data generation • 4 items • Updated Jun 25 • 13
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 218
Korean Datasets I've released so far. Collection 지금까지 업로드한 한국어 데이터셋 콜렉션입니다. • 8 items • Updated May 24 • 16