dfuhoiysOHSVFh82934gfjklb

huba-buba

AI & ML interests

None yet

Recent Activity

liked a dataset about 3 hours ago

STEM-AI-mtl/Electrical-engineering

liked a model about 5 hours ago

prithivMLmods/SmolLM2-360M-Grpo-r999

liked a dataset about 8 hours ago

osunlp/UGround-V1-Data

View all activity

Organizations

None yet

huba-buba's activity

upvoted a collection 4 days ago

QwQ

Collection

Qwen with Questions • 6 items • Updated 3 days ago • 76

upvoted a paper 7 days ago

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Paper • 2502.14768 • Published 17 days ago • 44

upvoted 2 articles 8 days ago

Article

SigLIP 2: A better multilingual vision language encoder

17 days ago

• 126

Article

SmolVLM2: Bringing Video Understanding to Every Device

18 days ago

• 197

upvoted a paper 11 days ago

WebGames: Challenging General-Purpose Web-Browsing AI Agents

Paper • 2502.18356 • Published 12 days ago • 11

upvoted a paper 15 days ago

AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO

Paper • 2502.14669 • Published 17 days ago • 11

upvoted an article 16 days ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

• 191

upvoted a paper 17 days ago

Thinking Preference Optimization

Paper • 2502.13173 • Published 20 days ago • 17

upvoted an article 21 days ago

Article

Proximal Policy Optimization (PPO)

Aug 5, 2022

• 25

upvoted 3 papers 24 days ago

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Paper • 2502.09604 • Published 24 days ago • 32

Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance

Paper • 2502.08127 • Published 26 days ago • 50

Scaling Pre-training to One Hundred Billion Data for Vision Language Models

Paper • 2502.07617 • Published 26 days ago • 29

upvoted an article 24 days ago

Article

Open R1: Update #2

and 6 others •

27 days ago

• 197

upvoted a paper 24 days ago

TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published 26 days ago • 46

upvoted 2 papers 26 days ago

Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

Paper • 2502.04404 • Published Feb 6 • 23

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published 27 days ago • 142

upvoted a paper 27 days ago

Agency Is Frame-Dependent

Paper • 2502.04403 • Published Feb 6 • 22

upvoted a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 199

upvoted 2 articles about 1 month ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.14k

Article

Open-R1: Update #1

and 7 others •

Feb 2

• 293