--- license: mit --- ## Model Summary Video-CCAM-14B-v1.1 is a lightweight Video-MLLM developed by TencentQQ Multimedia Research Team. ## Usage Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.9/3.10. ``` pip install -U pip torch transformers peft decord pysubs2 imageio ``` ## Inference ``` import os import torch from PIL import Image from transformers import AutoModel from eval import load_decord os.environ['TOKENIZERS_PARALLELISM'] = 'false' videoccam = AutoModel.from_pretrained( '', trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='auto', _attn_implementation='flash_attention_2', # llm_name_or_path='', # vision_encoder_name_or_path='' ) messages = [ [ { 'role': 'user', 'content': '\nDescribe this image in detail.' } ], [ { 'role': 'user', 'content': '