Zesen Cheng
ClownRat
AI & ML interests
multi-modal foundation model; Segmentation, Detection, and Tracking;
Recent Activity
authored
a paper
2 days ago
Instance Brownian Bridge as Texts for Open-vocabulary Video Instance
Segmentation
upvoted
an
article
4 days ago
Mixture of Experts Explained
upvoted
an
article
4 days ago
SigLIP 2: A better multilingual vision language encoder