DEVA: Tracking Anything with Decoupled Video Segmentation

titlecard

Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, Joon-Young Lee

University of Illinois Urbana-Champaign and Adobe

ICCV 2023

[arXiV (coming soon)] [PDF] [Project Page]

Highlights

  1. Provide long-term, open-vocabulary video segmentation with text-prompts out-of-the-box.
  2. Fairly easy to integrate your own image model! Wouldn't you or your reviewers be interested in seeing examples where your image model also works well on videos :smirk:? No finetuning is needed!

Abstract

We develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we propose a (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several tasks, most notably in large-vocabulary video panoptic segmentation and open-world video segmentation.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.