12 13 31

Kaiyan Zhang

iseesaw

iseesaw

AI & ML interests

None yet

Recent Activity

liked a model 4 days ago

Qwen/QwQ-32B

upvoted a collection 6 days ago

Qwen2.5-Coder

liked a Space 12 days ago

huggingface/ai-deadlines

View all activity

Organizations

iseesaw's activity

liked a model 4 days ago

Qwen/QwQ-32B

Text Generation • Updated 2 days ago • 103k • • 1.66k

upvoted a collection 6 days ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 292

liked a Space 12 days ago

310

AI Deadlines

⚡

Schedule tasks efficiently using AI-generated deadlines

liked a dataset 18 days ago

facebook/natural_reasoning

Viewer • Updated 16 days ago • 1.15M • 8.56k • 348

upvoted a paper 20 days ago

Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published 23 days ago • 16

commented a paper 20 days ago

Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published 23 days ago • 16 •

liked a Space 21 days ago

315

GAIA Leaderboard

🦾

Submit models for evaluation and view leaderboard

upvoted an article 21 days ago

Article

Our Transformers Code Agent beats the GAIA benchmark!

Jul 1, 2024

• 72

liked a dataset 22 days ago

cais/hle

Viewer • Updated 23 days ago • 2.7k • 7.26k • 270

authored 2 papers 26 days ago

Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published Feb 3 • 55

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Paper • 2501.18362 • Published Jan 30 • 21

upvoted 3 articles 26 days ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.14k

Article

What is test-time compute and how to scale it?

and 1 other •

about 1 month ago

• 53

Article

Open R1: Update #2

and 6 others •

27 days ago

• 197

liked 2 datasets 26 days ago

open-r1/OpenR1-Math-220k

Viewer • Updated 19 days ago • 450k • 45.4k • 476

AI-MO/NuminaMath-1.5

Viewer • Updated 27 days ago • 896k • 4.04k • 116

authored a paper 26 days ago

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published 27 days ago • 142

upvoted a paper 26 days ago

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published 27 days ago • 142

upvoted a paper about 1 month ago

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Paper • 2501.18362 • Published Jan 30 • 21

upvoted an article 2 months ago

Article

Process Reinforcement through Implicit Rewards

and 1 other •

Jan 3

• 24