Bhimraj Yadav's picture

Bhimraj Yadav PRO

bhimrazy

·

https://bhimraj.com.np

AI & ML interests

Computer Vision, Healthcare, Generative AI and NLP

Recent Activity

upvoted a paper 2 days ago

s1: Simple test-time scaling

upvoted a paper 14 days ago

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

upvoted a paper 15 days ago

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

View all activity

Organizations

bhimrazy's activity

upvoted a paper 2 days ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published 9 days ago • 94

upvoted a paper 14 days ago

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Paper • 2501.13826 • Published 17 days ago • 22

upvoted 5 papers 15 days ago

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Paper • 2501.11425 • Published 20 days ago • 90

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published 19 days ago • 81

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published 27 days ago • 89

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Paper • 2501.07171 • Published 27 days ago • 49

Agent Laboratory: Using LLM Agents as Research Assistants

Paper • 2501.04227 • Published Jan 8 • 84

upvoted a paper 16 days ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published 18 days ago • 79

upvoted a paper 17 days ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 18 days ago • 305

upvoted a paper 20 days ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published 30 days ago • 43

upvoted a paper 25 days ago

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published 26 days ago • 273

upvoted 3 papers 27 days ago

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

Paper • 2411.13093 • Published Nov 20, 2024 • 1

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published 30 days ago • 60

VideoRAG: Retrieval-Augmented Generation over Video Corpus

Paper • 2501.05874 • Published about 1 month ago • 67

upvoted 6 papers 28 days ago

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Paper • 2501.01957 • Published Jan 3 • 42

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Paper • 2501.01904 • Published Jan 3 • 31

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 48

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published Jan 7 • 68

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Paper • 2501.05366 • Published about 1 month ago • 92

Enhancing Human-Like Responses in Large Language Models

Paper • 2501.05032 • Published Jan 9 • 49