English

TimeChat-7B-ActivityNet-VTune Model

Model details

We trained TimeChat using VTune, a developed instruction-tuning method specifically designed to account for consistency.

For the tuning, we utilized 10K training videos from ActivityNet-Captions with 205K automatically generated annotations.

Evaluation

We evaluated the model on ActivtyNet-CON and ActivtyNet-Captions.

  • ActivityNet-CON

    Metric Value
    Ground 37.4
    R-Ground 28.3 (75.6)
    S-Ground 10.6 (28.3)
    H-Verify 19.6 (52.3)
    C-Verify 19.5 (51.5)
  • ActivityNet-Captions

    Metric Value
    R@1 IoU=0.3 57.74
    R@1 IoU=0.5 41.05
    R@1 IoU=0.7 23.72
    mIoU 40.89

Paper and Code for more information: Paper, Code

Citation

If you find our research and codes useful, please consider starring our repository and citing our paper:

@article{jung2024consistency,
  title={On the Consistency of Video Large Language Models in Temporal Comprehension},
  author={Jung, Minjoon and Xiao, Junbin and Zhang, Byoung-Tak and Yao, Angela},
  journal={arXiv preprint arXiv:2411.12951},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train mjjung/TimeChat-7B-ActivityNet-VTune

Collection including mjjung/TimeChat-7B-ActivityNet-VTune