raft_study

AI & ML interests

None defined yet.

Recent Activity

hendrydong authored a paper 20 days ago

BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

hendrydong authored a paper 24 days ago

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

hendrydong authored a paper 2 months ago

Offline Reinforcement Learning for LLM Multi-Step Reasoning

View all activity

raftrsf's activity

hendrydong

authored a paper 20 days ago

BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

Paper • 2502.03860 • Published 21 days ago • 23

hendrydong

authored a paper 24 days ago

Reward-Guided Speculative Decoding for Efficient LLM Reasoning

Paper • 2501.19324 • Published 27 days ago • 37

hendrydong

authored a paper 2 months ago

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38

hendrydong

authored a paper 5 months ago

MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

Paper • 2410.04698 • Published Oct 7, 2024 • 13

hendrydong

authored a paper 7 months ago

ThinK: Thinner Key Cache by Query-Driven Pruning

Paper • 2407.21018 • Published Jul 30, 2024 • 32

weqweasdas

updated a model 9 months ago

raftrsf/sfr_raft_iter5_2epoch

Text Generation • Updated Jun 17, 2024 • 8

weqweasdas

updated 2 datasets 9 months ago

raftrsf/sfr_concise_iter5_top1

Viewer • Updated Jun 14, 2024 • 20k • 51

raftrsf/sfr_concise_iter5_k32_with_rewards

Viewer • Updated Jun 14, 2024 • 20k • 64

weqweasdas

updated 2 models 9 months ago

raftrsf/sfr_raft_iter4_2epoch

Text Generation • Updated Jun 13, 2024 • 10

raftrsf/sfr_raft_iter4

Text Generation • Updated Jun 13, 2024 • 10

weqweasdas

updated 2 datasets 9 months ago

raftrsf/sfr_concise_iter4_top1

Viewer • Updated Jun 12, 2024 • 20k • 57

raftrsf/sfr_concise_iter4_k32_with_rewards

Viewer • Updated Jun 12, 2024 • 20k • 87

weqweasdas

updated a model 10 months ago

raftrsf/pair_pref

Text Generation • Updated May 18, 2024 • 10

weqweasdas

updated a dataset 10 months ago

raftrsf/ipo_eval_data_baseline.json

Viewer • Updated May 18, 2024 • 7.62k • 60

weqweasdas

authored a paper 10 months ago

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 68

hendrydong

authored 3 papers 10 months ago

Reverse Diffusion Monte Carlo

Paper • 2307.02037 • Published Jul 5, 2023 • 1

Spurious Feature Diversification Improves Out-of-distribution Generalization

Paper • 2309.17230 • Published Sep 29, 2023

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

Paper • 2312.11456 • Published Dec 18, 2023 • 1

weqweasdas

authored a paper 10 months ago

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

Paper • 2312.11456 • Published Dec 18, 2023 • 1

hendrydong

authored a paper 10 months ago

Local Augmentation for Graph Neural Networks

Paper • 2109.03856 • Published Sep 8, 2021