Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
DAMO-NLP-SG
/
VL3-SigLIP-NaViT
like
4
Follow
Language Technology Lab at Alibaba DAMO Academy
85
Image Feature Extraction
Transformers
Safetensors
English
videollama3_vision_encoder
feature-extraction
visual-encoder
multi-modal-large-language-model
custom_code
arxiv:
2501.13106
arxiv:
2406.07476
arxiv:
2306.02858
License:
apache-2.0
Model card
Files
Files and versions
Community
3
Train
Use this model
main
VL3-SigLIP-NaViT
Commit History
Update README.md
ed2c154
verified
Cyril666
commited on
about 2 hours ago
Update README.md
50d747a
verified
Cyril666
commited on
about 2 hours ago
Upload model
0e04069
verified
ClownRat
commited on
3 days ago
Upload processor
592e852
verified
ClownRat
commited on
3 days ago
initial commit
3eb707c
verified
ClownRat
commited on
3 days ago