Raja Biswas's picture

Raja Biswas

rbiswasfc

·

AI & ML interests

NLP, Generative AI

Recent Activity

upvoted an article 3 days ago

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

upvoted an article 3 days ago

Illustrating Reinforcement Learning from Human Feedback (RLHF)

liked a model 3 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

View all activity

Organizations

rbiswasfc's activity

upvoted 2 articles 3 days ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

By

•

18 days ago

• 43

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

• 173

upvoted 2 collections 7 days ago

SimpleRL

The collection for the Project "Simple Reinforcement Learning for Reasoning" • 2 items • Updated 7 days ago • 4

CodeI/O

Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated 13 days ago • 6

upvoted a paper 9 days ago

Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 84

upvoted an article 10 days ago

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11, 2024

• 117

upvoted a collection 10 days ago

NuminaMath

Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated 15 days ago • 75

upvoted an article 12 days ago

Article

1 Billion Classifications

13 days ago

• 39

upvoted 4 papers 14 days ago

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

Paper • 2502.03544 • Published 20 days ago • 42

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published 18 days ago • 117

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Paper • 2502.06781 • Published 15 days ago • 59

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published 15 days ago • 136

upvoted 2 collections 14 days ago

OpenR1-Math

Dataset and SFT model distilled from DeepSeek-R1. Check out our blog post for more details: https://huggingface.co/blog/open-r1/update-2 • 3 items • Updated 11 days ago • 6

🧠 Reasoning datasets

Datasets with reasoning traces for math and code released by the community • 12 items • Updated 6 days ago • 83

upvoted a paper 14 days ago

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published 17 days ago • 33

upvoted an article 15 days ago

Article

Open R1: Update #2

By

and 6 others •

15 days ago

• 185

upvoted a paper 16 days ago

On Teacher Hacking in Language Model Distillation

Paper • 2502.02671 • Published 21 days ago • 17

upvoted an article 16 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

29 days ago

• 773

upvoted 2 papers 16 days ago

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published 20 days ago • 53

LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published 20 days ago • 56