6 140 181

Inui

Norm

https://normxu.github.io/

AI & ML interests

Video Diffusion; Large Language Model; Object Detection; OCR

Recent Activity

updated a collection 2 days ago

Fundamental Research

upvoted a paper 2 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

liked a Space 5 days ago

nanotron/ultrascale-playbook

View all activity

Organizations

Norm's activity

upvoted a paper 2 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 6 days ago • 115

upvoted 2 papers 6 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 7 days ago • 145

Phantom: Subject-consistent video generation via cross-modal alignment

Paper • 2502.11079 • Published 10 days ago • 50

upvoted a collection 8 days ago

Deepseek Papers

Collection

Deepseek papers collection • 18 items • Updated 8 days ago • 155

upvoted a paper 13 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published 20 days ago • 26

upvoted a paper 14 days ago

Magic 1-For-1: Generating One Minute Video Clips within One Minute

Paper • 2502.07701 • Published 15 days ago • 32

upvoted a paper 21 days ago

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Paper • 2502.02492 • Published 22 days ago • 57

upvoted 9 papers about 1 month ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 332

upvoted a paper about 2 months ago

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 42

upvoted a collection about 2 months ago

Cosmos

Collection

The collection of Cosmos models • 31 items • Updated Jan 17 • 263

upvoted 2 papers 2 months ago

Large Motion Video Autoencoding with Cross-modal Video VAE

Paper • 2412.17805 • Published Dec 23, 2024 • 24

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 135