1 53 20

NK

NeuralKartMocker

AI & ML interests

Gen AI, GAN, LLMs, NLP, Gen Music

Recent Activity

upvoted a paper 3 days ago

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

upvoted a paper 3 days ago

Self-rewarding correction for mathematical reasoning

upvoted a paper 3 days ago

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

View all activity

Organizations

None yet

NeuralKartMocker's activity

upvoted 20 papers 3 days ago

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Paper • 2502.19400 • Published 9 days ago • 42

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published 8 days ago • 75

MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning

Paper • 2502.19634 • Published 8 days ago • 56

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think

Paper • 2502.20172 • Published 8 days ago • 25

UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published 8 days ago • 26

Mobius: Text to Seamless Looping Video Generation via Latent Shift

Paper • 2502.20307 • Published 8 days ago • 16

FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Paper • 2502.20126 • Published 8 days ago • 19

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published 10 days ago • 69

Towards an AI co-scientist

Paper • 2502.18864 • Published 9 days ago • 40

Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs

Paper • 2502.19413 • Published 9 days ago • 19

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published 4 days ago • 59

Introducing Visual Perception Token into Multimodal Large Language Model

Paper • 2502.17425 • Published 11 days ago • 14

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published 16 days ago • 66

Thus Spake Long-Context Large Language Model

Paper • 2502.17129 • Published 11 days ago • 67

VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing

Paper • 2502.17258 • Published 11 days ago • 71

GCC: Generative Color Constancy via Diffusing a Color Checker

Paper • 2502.17435 • Published 11 days ago • 27

Tell me why: Visual foundation models as self-explainable classifiers

Paper • 2502.19577 • Published 8 days ago • 10