Inui's picture

Inui

Norm

·

https://normxu.github.io/

AI & ML interests

Video Diffusion; Large Language Model; Object Detection; OCR

Recent Activity

updated a collection 2 days ago

Fundamental Research

upvoted a paper 2 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

liked a Space 5 days ago

nanotron/ultrascale-playbook

View all activity

Organizations

Norm's activity

updated a collection 2 days ago

Fundamental Research

8 items • Updated 2 days ago • 1

upvoted a paper 2 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 6 days ago • 115

liked a Space 5 days ago

The Ultra-Scale Playbook

The ultimate guide to training LLM on large GPU Clusters

upvoted 2 papers 6 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 7 days ago • 145

Phantom: Subject-consistent video generation via cross-modal alignment

Paper • 2502.11079 • Published 10 days ago • 50

upvoted a collection 8 days ago

Deepseek Papers

Deepseek papers collection • 18 items • Updated 8 days ago • 155

updated a collection 13 days ago

Multimodal Language Model

What does matter besides data receipt when training a Multimodal language model? • 30 items • Updated 13 days ago • 1

upvoted a paper 13 days ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published 20 days ago • 26

updated a collection 14 days ago

Image / Video Gen

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion • 35 items • Updated 14 days ago • 8

upvoted a paper 14 days ago

Magic 1-For-1: Generating One Minute Video Clips within One Minute

Paper • 2502.07701 • Published 15 days ago • 32

liked a model 15 days ago

Alpha-VLLM/Lumina-Next-SFT-diffusers

Text-to-Image • Updated Jul 8, 2024 • 5.32k • 26

updated a collection 16 days ago

Open Datasets

Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it. • 17 items • Updated 16 days ago

liked a dataset 16 days ago

omni-research/Tarsier2-Recap-585K

Preview • Updated Jan 24 • 59.7k • 11

updated a collection 21 days ago

Image / Video Gen

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion • 35 items • Updated 14 days ago • 8

upvoted a paper 21 days ago

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Paper • 2502.02492 • Published 22 days ago • 57

liked a model 25 days ago

Alpha-VLLM/Lumina-Image-2.0

Text-to-Image • Updated 19 days ago • 14.4k • • 267

liked a model 30 days ago

Qwen/Qwen2.5-VL-72B-Instruct

Image-Text-to-Text • Updated 11 days ago • 244k • 330

updated a collection about 1 month ago

Language Model

4 items • Updated Jan 24 • 1

upvoted a paper about 1 month ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 332

liked a model about 1 month ago

HuggingFaceTB/SmolVLM-256M-Instruct

Image-Text-to-Text • Updated 23 days ago • 40.3k • 160