Zero-Shot Image Classification
vision
merve HF staff commited on
Commit
b1513e5
·
verified ·
1 Parent(s): 5ba9bc7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - vision
5
+ pipeline_tag: zero-shot-image-classification
6
+ ---
7
+
8
+ # SigLIP 2 So400m
9
+
10
+ [SigLIP 2](https://huggingface.co/papers/2502.14786) extends the pretraining objective of
11
+ [SigLIP](https://huggingface.co/papers/2303.15343) with prior, independently developed techniques
12
+ into a unified recipe, for improved semantic understanding, localization, and dense features.
13
+
14
+ ## Intended uses
15
+
16
+ You can use the raw model for tasks like zero-shot image classification and
17
+ image-text retrieval, or as a vision encoder for VLMs (and other vision tasks).
18
+
19
+
20
+ ## Training procedure
21
+
22
+ SigLIP 2 adds some clever training objectives on top of SigLIP:
23
+
24
+ 1. Decoder loss
25
+ 2. Global-local and masked prediction loss
26
+ 3. Aspect ratio and resolution adaptibility
27
+
28
+ ### Training data
29
+
30
+ SigLIP 2 is pre-trained on the WebLI dataset [(Chen et al., 2023)](https://arxiv.org/abs/2209.06794).
31
+
32
+ ### Compute
33
+
34
+ The model was trained on up to 2048 TPU-v5e chips.
35
+
36
+ ## Evaluation results
37
+
38
+ Evaluation of SigLIP 2 is shown below (taken from the paper).
39
+
40
+ ![Evaluation Table](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/sg2-blog/eval_table.png)
41
+
42
+ ### BibTeX entry and citation info
43
+
44
+ ```bibtex
45
+ @misc{tschannen2025siglip2multilingualvisionlanguage,
46
+ title={SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features},
47
+ author={Michael Tschannen and Alexey Gritsenko and Xiao Wang and Muhammad Ferjad Naeem and Ibrahim Alabdulmohsin and Nikhil Parthasarathy and Talfan Evans and Lucas Beyer and Ye Xia and Basil Mustafa and Olivier Hénaff and Jeremiah Harmsen and Andreas Steiner and Xiaohua Zhai},
48
+ year={2025},
49
+ eprint={2502.14786},
50
+ archivePrefix={arXiv},
51
+ primaryClass={cs.CV},
52
+ url={https://arxiv.org/abs/2502.14786},
53
+ }
54
+ ```