English

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.


license: mit datasets: - ShuhuaiRen/TimeIT language: - en

TimeChat-7B-Charades-VTune Model

Model details

We trained TimeChat using VTune, a developed instruction-tuning method specifically designed to account for consistency.

For the tuning, we utilized 5K training videos from Charades-STA with 99K automatically generated annotations.

Evaluation

We evaluated the model on Charades-CON and Charades-STA.

  • Charades-CON

    Metric Value
    Ground 76.2
    R-Ground 69.2 (90.8)
    S-Ground 36.2 (47.5)
    H-Verify 44.8 (58.8)
    C-Verify 42.4 (55.7)
  • Charades-STA

    Metric Value
    R@1 IoU=0.3 72.74
    R@1 IoU=0.5 58.47
    R@1 IoU=0.7 34.70
    mIoU 50.65

Paper and Code for more information: Paper, Code

Citation

If you find our research and codes useful, please consider starring our repository and citing our paper:

@article{jung2024consistency,
  title={On the Consistency of Video Large Language Models in Temporal Comprehension},
  author={Jung, Minjoon and Xiao, Junbin and Zhang, Byoung-Tak and Yao, Angela},
  journal={arXiv preprint arXiv:2411.12951},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train mjjung/TimeChat-7B-Charades-VTune

Collection including mjjung/TimeChat-7B-Charades-VTune