Input Video length constraints
#6
by
NikhilJoson
- opened
Is there any limits to the length of the input video that can be provided to SmolVLM2 2.2B?
What is the max length, of a video, that it can handle?
there is a limit of 64 frames.
SmolVLM2 will sample frames at 1FPS with a max of 64 frames.
If the video is longer than 64 seconds, then it will sample evenly spaced frames
@mfarre 1FPS is a bit low for my desired usecase. Is there an option to increase the sample rate? Would it be an alternative to manually sample the frames and use the multi-image inference?
@j0yk1ll
You can adjust the fps by:
messages = [
{
"role": "user",
"content": [
{"type": "video", "path": "path_to_video.mp4", "target_fps": fps},
{"type": "text", "text": "Describe this video in detail"}
]
},
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device, dtype=torch.bfloat16)
generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=64)
generated_texts = processor.batch_decode(
generated_ids,
skip_special_tokens=True,
)
print(generated_texts[0])
Best,
Orr