Papers
arxiv:2308.13494

Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers

Published on Aug 25, 2023
· Submitted by akhaliq on Aug 28, 2023
Authors:
,
,

Abstract

Vision Transformers achieve impressive accuracy across a range of visual recognition tasks. Unfortunately, their accuracy frequently comes with high computational costs. This is a particular issue in video recognition, where models are often applied repeatedly across frames or temporal chunks. In this work, we exploit temporal redundancy between subsequent inputs to reduce the cost of Transformers for video processing. We describe a method for identifying and re-processing only those tokens that have changed significantly over time. Our proposed family of models, Eventful Transformers, can be converted from existing Transformers (often without any re-training) and give adaptive control over the compute cost at runtime. We evaluate our method on large-scale datasets for video object detection (ImageNet VID) and action recognition (EPIC-Kitchens 100). Our approach leads to significant computational savings (on the order of 2-4x) with only minor reductions in accuracy.

Community

What kind of resources(GPUs etc.) are needed for minimal training for the purposes of learning ? Can I see some instructions ?

deleted

Code here: https://github.com/WISION-Lab/eventful-transformer/

For the most part, our method doesn't require any re-training. You can generally just use pre-trained weights (links on GitHub).

For fine-tuning the temporal component in Section 5.2, that took <2 days on one 3090 (if I remember correctly).

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2308.13494 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2308.13494 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2308.13494 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.