Xiangtai Li's picture

Xiangtai Li

LXT

·

https://lxtgh.github.io/

AI & ML interests

Computer Vision, Multi-Modal Understanding, Generative AI

Recent Activity

liked a Space 1 day ago

fffiloni/Sa2VA-simple-demo

reacted to merve's post with 👍 3 days ago

ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license 💗 https://huggingface.co/collections/ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093 > The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️ > The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint) > The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM 💬 the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks ⤵️ > Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.

reacted to merve's post with 🔥 3 days ago

ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license 💗 https://huggingface.co/collections/ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093 > The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️ > The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint) > The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM 💬 the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks ⤵️ > Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.

View all activity

Organizations

LXT's activity

commented a paper 4 days ago

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published 5 days ago • 36 •

commented 3 papers about 1 month ago

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 45 •

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 45 •

EMOv2: Pushing 5M Vision Model Frontier

Paper • 2412.06674 • Published Dec 9, 2024 • 13 •

commented a paper 3 months ago

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Paper • 2410.08261 • Published Oct 10, 2024 • 50 •

commented 3 papers 6 months ago

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 52 •

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 52 •

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

Paper • 2406.20085 • Published Jun 28, 2024 • 11 •

commented 2 papers 7 months ago

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 52 •

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 52 •

New activity in Dense-World/OMG-LLaVA 7 months ago

Upload omg_llava_7b_xxl_pretrain_1024image_8gpus.pth

#1 opened 7 months ago by

New activity in LXT/OMG_Seg 12 months ago

Apply for community grant: Academic project (gpu)

#2 opened 12 months ago by

Update main.py

#4 opened 12 months ago by

add spaces lib

#3 opened 12 months ago by

Apply for community grant: Personal project (gpu)

#1 opened 12 months ago by

Apply for community grant: Academic project (gpu)

#2 opened 12 months ago by

Apply for community grant: Personal project (gpu)

#1 opened 12 months ago by