Xin Li's picture

Xin Li PRO

lixin4ever

·

https://lixin4ever.github.io/

lixin4ever

AI & ML interests

Natural Language Processing, Machine Learning

Recent Activity

upvoted a paper 1 day ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

liked a dataset 2 days ago

DAMO-NLP-SG/VL3-Syn7M

authored a paper 2 days ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

View all activity

Organizations

lixin4ever's activity

upvoted a paper 1 day ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published 18 days ago • 90

upvoted a collection 9 days ago

🖼️ 2025 MLLMs

6 items • Updated 14 days ago • 1

upvoted a collection 17 days ago

VideoLLaMA3

Frontier Multimodal Foundation Models for Video Understanding • 14 items • Updated 2 days ago • 11

upvoted a paper 17 days ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published 18 days ago • 79

upvoted a paper 18 days ago

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published 19 days ago • 81

upvoted 3 papers about 1 month ago

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 90

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 99

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 41

upvoted 2 collections 2 months ago

PixMo

A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated Jan 6 • 56

Inf-CL

The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated 16 days ago • 3

upvoted a collection 3 months ago

OpenCoder Datasets

OpenCoder datasets! • 6 items • Updated Nov 15, 2024 • 39

upvoted a paper 3 months ago

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Paper • 2410.23266 • Published Oct 30, 2024 • 20

upvoted 6 papers 4 months ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 89

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Paper • 2410.12490 • Published Oct 16, 2024 • 8

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 92

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published Oct 16, 2024 • 31

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 169

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 38

upvoted 2 papers 5 months ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 140

A Controlled Study on Long Context Extension and Generalization in LLMs

Paper • 2409.12181 • Published Sep 18, 2024 • 44