Zesen Cheng's picture

Zesen Cheng

ClownRat

·

AI & ML interests

multi-modal foundation model; Segmentation, Detection, and Tracking;

Recent Activity

authored a paper 2 days ago

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation

upvoted an article 4 days ago

Mixture of Experts Explained

upvoted an article 4 days ago

SigLIP 2: A better multilingual vision language encoder

View all activity

Organizations

Collections 1

Papers 14

arxiv:2502.13923

arxiv:2501.13106

arxiv:2501.00599

arxiv:2411.08147

models 5

ClownRat/VideoLLaMA2.1-7B-16F

Text Generation • Updated Jan 6 • 10

ClownRat/resnet-50-torchvision

Updated Dec 25, 2024 • 13

ClownRat/mask2former-resnet-50-coco-instance

Updated Dec 25, 2024 • 74

ClownRat/resnet-101-torchvision

Updated Dec 23, 2024 • 10

ClownRat/mask2former-resnet-101-coco-instance

Updated Dec 17, 2024 • 50

datasets 2

ClownRat/YoutubeVIS-2019

Updated about 1 month ago • 40

ClownRat/COCO2017-Instance

Viewer • Updated Dec 11, 2024 • 123k • 87 • 1