
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text
β’
Updated
β’
94.2k
β’
404
Generate speech from text using selected language and speaker
Generate text responses using images and text prompts
A community project to create an image preferences dataset.
Generate clickable coordinates on a screenshot