Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning Paper • 2502.06533 • Published 24 days ago • 18 • 2
Revisiting Hierarchical Text Classification: Inference and Metrics Paper • 2410.01305 • Published Oct 2, 2024
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning Paper • 2502.06533 • Published 24 days ago • 18
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning Paper • 2502.06533 • Published 24 days ago • 18
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 59
lecraquito/evaluation_addition_14digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Oct 5, 2024 • 1k • 53
lecraquito/evaluation_addition_13digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Oct 4, 2024 • 1k • 57
lecraquito/evaluation_addition_12digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Oct 1, 2024 • 1k • 50
lecraquito/evaluation_addition_12digits_1000examples_n_plus_n.jsonl Viewer • Updated Oct 1, 2024 • 1k • 60
lecraquito/evaluation_addition_10digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Sep 30, 2024 • 1k • 83
lecraquito/evaluation_addition_10digits_1000examples_n_plus_n.jsonl Viewer • Updated Sep 30, 2024 • 1k • 74
lecraquito/evaluation_addition_11digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Sep 30, 2024 • 1k • 68
lecraquito/evaluation_addition_11digits_1000examples_n_plus_n.jsonl Viewer • Updated Sep 30, 2024 • 1k • 48
lecraquito/evaluation_addition_9digits_1000examples_n_plus_less_n.jsonl Viewer • Updated Sep 30, 2024 • 1k • 65