bwang0911 commited on
Commit
6417422
1 Parent(s): ba49c92

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -3
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - feature-extraction
4
+ - sentence-similarity
5
+ - mteb
6
+ language: en
7
+ inference: false
8
+ license: apache-2.0
9
+ ---
10
+ <!-- TODO: add evaluation results here -->
11
+ <br><br>
12
+
13
+ <p align="center">
14
+ <img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
15
+ </p>
16
+
17
+
18
+ <p align="center">
19
+ <b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
20
+ </p>
21
+
22
+ ## Quick Start
23
+
24
+ The easiest way to starting using `jina-clip-v1` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).
25
+
26
+ ## Intended Usage & Model Info
27
+
28
+ ### `jina-clip-v1` Overview
29
+
30
+ ### `jina-clip-v1` Overview
31
+
32
+ `jina-clip-v1` is an English, monolingual **multimodal (text-image) embedding model**.
33
+
34
+ Traditional text embedding models, such as [jina-embeddings-v2-base-en](https://huggingface.co/jinaai/jina-embeddings-v2-base-en),
35
+ excel in text-to-text retrieval but lack cross-modal retrieval capabilities.
36
+ Conversely, CLIP-like models, such as [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32),
37
+ align image embeddings with text embeddings but underperform in text-to-text retrieval due to their training methodology and context length limitations.
38
+
39
+ `jina-clip-v1` is an innovative **multimodal embedding model**.
40
+ Its text component achieves comparable performance to `jina-embeddings-v2-base-en` in text-to-text retrieval,
41
+ while the overall model delivers state-of-the-art performance in cross-modal retrieval tasks.
42
+ This makes it an ideal choice for multimodal retrieval-augmented generation (M-RAG) applications,
43
+ allowing for both text-to-text and text-to-image searches with a single model.
44
+
45
+
46
+ ## Data & Parameters
47
+
48
+ Jina CLIP V1 [technical report]() coming soon.
49
+
50
+ ## Usage
51
+
52
+ You can use Jina CLIP directly from transformers package.
53
+
54
+ ```python
55
+ !pip install transformers
56
+ from transformers import AutoModel
57
+ from numpy.linalg import norm
58
+
59
+ cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
60
+ model = AutoModel.from_pretrained('jinaai/jina-clip-v1')
61
+ text_embeddings = model.encode_text(['How is the weather today?', 'What is the current weather like today?'])
62
+ image_embeddings = model.encode_image(['raindrop.png'])
63
+ print(cos_sim(text_embeddings[0], text_embeddings[1])) # text embedding similarity
64
+ print(cos_sim(text_embeddings[0], image_embeddings[0])) # text-image cross-modal similarity
65
+ ```
66
+
67
+
68
+ ## Contact
69
+
70
+ Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
71
+
72
+ ## Citation
73
+
74
+ If you find Jina CLIP useful in your research, please cite the following paper:
75
+
76
+ ```console
77
+ TBD
78
+ ```
79
+