NanoBEIR 🍺 Collection A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 13
view article Article Preference Tuning LLMs with Direct Preference Optimization Methods Jan 18, 2024 • 48
Visual Representation Learning with Stochastic Frame Prediction Paper • 2406.07398 • Published Jun 11, 2024 • 1
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 292
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization Paper • 2403.17031 • Published Mar 24, 2024 • 6
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 190