TinyLLaVA-Video

arXivGithub

For training data, We combine partial data from two datasets: LLaVA-Video-178K and Valley.

Stage Source #Sample
Pretrain LLaVA-Video-178K + Valley 397k
Finetune LLaVA-Video-178K 491k

Pretrain Data

We use four subsets of LLaVA-Video-178K: 0_30_s_academic_v0_1, 30_60_s_academic_v0_1, 0_30_s_youtube_v0_1, and 30_60_s_youtube_v0_1, supplemented with the filtered Video-LLaVA.

We provide cleaned annotations data, and the video data can be downloaded from LLaVA-Video-178K and Video-LLaVA.

Finetune Data

We use four subsets of LLaVA-Video-178K: 0_30_s_academic_v0_1, 30_60_s_academic_v0_1, 0_30_s_youtube_v0_1, and 30_60_s_youtube_v0_1.

We provide cleaned annotations data, and the video data can be downloaded from LLaVA-Video-178K.

Organize Data

Organize the image files and annotation files as follows in path/to/your/dataset:

dataset
β”œβ”€β”€ academic_source
β”œβ”€β”€ liwei_youtube_videos
β”œβ”€β”€ valley
β”œβ”€β”€ text_files
β”‚   β”œβ”€β”€ cleaned_video_caption.json
β”‚   β”œβ”€β”€ cleaned_video_openqa.json

Note: If there is any infringement, please contact us for removal.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.