|
--- |
|
base_model: |
|
- Snowflake/snowflake-arctic-embed-l-v2.0 |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- xlm-roberta |
|
- mteb |
|
- arctic |
|
- snowflake-arctic-embed |
|
- text-embeddings-inference |
|
library_name: sentence-transformers |
|
language: |
|
- af |
|
- ar |
|
- az |
|
- be |
|
- bg |
|
- bn |
|
- ca |
|
- ceb |
|
- cs |
|
- cy |
|
- da |
|
- de |
|
- el |
|
- en |
|
- es |
|
- et |
|
- eu |
|
- fa |
|
- fi |
|
- fr |
|
- gl |
|
- gu |
|
- he |
|
- hi |
|
- hr |
|
- ht |
|
- hu |
|
- hy |
|
- id |
|
- is |
|
- it |
|
- ja |
|
- jv |
|
- ka |
|
- kk |
|
- km |
|
- kn |
|
- ko |
|
- ky |
|
- lo |
|
- lt |
|
- lv |
|
- mk |
|
- ml |
|
- mn |
|
- mr |
|
- ms |
|
- my |
|
- ne |
|
- nl |
|
- pa |
|
- pl |
|
- pt |
|
- qu |
|
- ro |
|
- ru |
|
- si |
|
- sk |
|
- sl |
|
- so |
|
- sq |
|
- sr |
|
- sv |
|
- sw |
|
- ta |
|
- te |
|
- th |
|
- tl |
|
- tr |
|
- uk |
|
- ur |
|
- vi |
|
- yo |
|
- zh |
|
--- |
|
|
|
GGUF quants of [Snowflake/snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) created using [llama.cpp](https://github.com/ggerganov/llama.cpp) |
|
|
|
Original model card: |
|
*** |
|
|
|
<h1 align="center">Snowflake's Arctic-embed-l-v2.0</h1> |
|
<h4 align="center"> |
|
<p> |
|
<a href=#news>News</a> | |
|
<a href=#models>Models</a> | |
|
<a href=#usage>Usage</a> | |
|
<a href="#evaluation">Evaluation</a> | |
|
<a href="#contact">Contact</a> | |
|
<a href="#faq">FAQ</a> |
|
<a href="#license">License</a> | |
|
<a href="#acknowledgement">Acknowledgement</a> |
|
<p> |
|
</h4> |
|
|
|
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=18f5b1a3-da66-4f25-92d3-21da829509c3" /> |
|
|
|
## News |
|
- 12/11/2024: Release of [Technical Report](https://arxiv.org/abs/2412.04506) |
|
- 12/04/2024: Release of [snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) and [snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0) our newest models with multilingual workloads in mind. |
|
|
|
## Models |
|
Snowflake arctic-embed-l-v2.0 is the newest addition to the suite of embedding models Snowflake has released optimizing for retrieval performance and inference efficiency. |
|
Arctic Embed 2.0 introduces a new standard for multilingual embedding models, combining high-quality multilingual text retrieval without sacrificing performance in English. |
|
Released under the permissive Apache 2.0 license, Arctic Embed 2.0 is ideal for applications that demand reliable, enterprise-grade multilingual search and retrieval at scale. |
|
|
|
Key Features: |
|
|
|
1. Multilingual without compromise: Excels in English and non-English retrieval, outperforming leading open-source and proprietary models on benchmarks like MTEB Retrieval, CLEF, and MIRACL. |
|
|
|
2. Inference efficiency: Its 303m non-embedding parameters inference is fast and efficient for any scale. |
|
|
|
3. Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training. |
|
|
|
4. Drop-In Replacement: arctic-embed-l-v2.0 builds on [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which allows direct drop-in inference replacement with any form of new libraries, kernels, inference engines etc. |
|
|
|
5. Long Context Support: arctic-embed-l-v2.0 builds on [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which can support a context window of up to 8192 via the use of RoPE. |
|
|
|
|
|
### Quality Benchmarks |
|
Unlike most other open-source models, Arctic-embed-l-v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF). |
|
You no longer need to support models to empower high-quality English and multilingual retrieval. All numbers mentioned below are the average NDCG@10 across the dataset being discussed. |
|
|
|
| Model Name | # params | # non-emb params | # dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) | |
|
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |
|
| **snowflake-arctic-l-v2.0** | 568M | 303M | 1024 | **55.6** | 55.8 | **52.9** | **54.3** | |
|
| snowflake-arctic-m | 109M | 86M | 768 | 54.9 | 24.9 | 34.4 | 29.1 | |
|
| snowflake-arctic-l | 335M | 303M | 1024 | 56.0 | 34.8 | 38.2 | 33.7 | |
|
| me5 base | 560M | 303M | 1024 | 51.4 | 54.0 | 43.0 | 34.6 | |
|
| bge-m3 (BAAI) | 568M | 303M | 1024 | 48.8 | **56.8** | 40.8 | 41.3 | |
|
| gte (Alibaba) | 305M | 113M | 768 | 51.1 | 52.3 | 47.7 | 53.1 | |
|
|
|
Aside from high-quality retrieval arctic delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 4x with less than 3% degredation in quality. |
|
Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc. |
|
|
|
| Model | | BEIR (15) | Relative Performance | MIRACL (4) | Relative Performance | CLEF (5) | Relative Performance | CLEF (Full) | Relative Performance | |
|
|---|---|:---:|:---:|:---:|:---:|:---:|---|---|---| |
|
| snowflake-arctic-l-v2.0 | 1024 | 55.6 | N/A | 55.8 | N/A | 52.9 | N/A | 54.3 | N/A | |
|
| snowflake-arctic-l-v2.0 | 256 | 54.3 | -0.18% | 54.3 | -2.70% | 51.9 | -1.81% | 53.4 | -1.53% | |
|
|
|
## Usage |
|
|
|
### Using Sentence Transformers |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
# Load the model |
|
model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0' |
|
model = SentenceTransformer(model_name) |
|
|
|
# Define the queries and documents |
|
queries = ['what is snowflake?', 'Where can I get the best tacos?'] |
|
documents = ['The Data Cloud!', 'Mexico City of Course!'] |
|
|
|
# Compute embeddings: use `prompt_name="query"` to encode queries! |
|
query_embeddings = model.encode(queries, prompt_name="query") |
|
document_embeddings = model.encode(documents) |
|
|
|
# Compute cosine similarity scores |
|
scores = model.similarity(query_embeddings, document_embeddings) |
|
|
|
# Output the results |
|
for query, query_scores in zip(queries, scores): |
|
doc_score_pairs = list(zip(documents, query_scores)) |
|
doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True) |
|
print("Query:", query) |
|
for document, score in doc_score_pairs: |
|
print(score, document) |
|
|
|
``` |
|
|
|
|
|
|
|
### Using Huggingface Transformers |
|
|
|
|
|
You can use the transformers package to use Snowflake's arctic-embed model, as shown below. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query). |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0' |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModel.from_pretrained(model_name, add_pooling_layer=False) |
|
model.eval() |
|
|
|
query_prefix = 'query: ' |
|
queries = ['what is snowflake?', 'Where can I get the best tacos?'] |
|
queries_with_prefix = ["{}{}".format(query_prefix, i) for i in queries] |
|
query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=8192) |
|
|
|
documents = ['The Data Cloud!', 'Mexico City of Course!'] |
|
document_tokens = tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=8192) |
|
|
|
# Compute token embeddings |
|
with torch.no_grad(): |
|
query_embeddings = model(**query_tokens)[0][:, 0] |
|
document_embeddings = model(**document_tokens)[0][:, 0] |
|
|
|
|
|
# normalize embeddings |
|
query_embeddings = torch.nn.functional.normalize(query_embeddings, p=2, dim=1) |
|
document_embeddings = torch.nn.functional.normalize(document_embeddings, p=2, dim=1) |
|
|
|
scores = torch.mm(query_embeddings, document_embeddings.transpose(0, 1)) |
|
for query, query_scores in zip(queries, scores): |
|
doc_score_pairs = list(zip(documents, query_scores)) |
|
doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True) |
|
#Output passages & scores |
|
print("Query:", query) |
|
for document, score in doc_score_pairs: |
|
print(score, document) |
|
``` |
|
|
|
|
|
This should produce the following scores |
|
|
|
``` |
|
Query: what is snowflake? |
|
tensor(0.2715) The Data Cloud! |
|
tensor(0.0661) Mexico City of Course! |
|
Query: Where can I get the best tacos? |
|
tensor(0.2797) Mexico City of Course! |
|
tensor(0.1250) The Data Cloud! |
|
``` |
|
|
|
### Using Huggingface Transformers.js |
|
|
|
If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using: |
|
```bash |
|
npm i @huggingface/transformers |
|
``` |
|
|
|
You can then use the model for retrieval, as follows: |
|
|
|
```js |
|
import { pipeline, dot } from '@huggingface/transformers'; |
|
|
|
// Create feature extraction pipeline |
|
const extractor = await pipeline('feature-extraction', 'Snowflake/snowflake-arctic-embed-m-v2.0', { |
|
dtype: 'q8', |
|
}); |
|
|
|
// Generate sentence embeddings |
|
const sentences = [ |
|
'query: what is snowflake?', |
|
'The Data Cloud!', |
|
'Mexico City of Course!', |
|
] |
|
const output = await extractor(sentences, { normalize: true, pooling: 'cls' }); |
|
|
|
// Compute similarity scores |
|
const [source_embeddings, ...document_embeddings ] = output.tolist(); |
|
const similarities = document_embeddings.map(x => dot(source_embeddings, x)); |
|
console.log(similarities); // [0.24783534471401417, 0.05313122704326892] |
|
``` |
|
|
|
|
|
## Contact |
|
|
|
|
|
Feel free to open an issue or pull request if you have any questions or suggestions about this project. |
|
You also can email Daniel Campos([email protected]). |
|
|
|
|
|
## License |
|
Arctic is licensed under the [Apache-2](https://www.apache.org/licenses/LICENSE-2.0). The released models can be used for commercial purposes free of charge. |