Update README.md

0995861 verified 13 days ago

9.19 kB

	---
	base_model:
	- Snowflake/snowflake-arctic-embed-l-v2.0
	pipeline_tag: sentence-similarity
	tags:
	- xlm-roberta
	- mteb
	- arctic
	- snowflake-arctic-embed
	- text-embeddings-inference
	library_name: sentence-transformers
	language:
	- af
	- ar
	- az
	- be
	- bg
	- bn
	- ca
	- ceb
	- cs
	- cy
	- da
	- de
	- el
	- en
	- es
	- et
	- eu
	- fa
	- fi
	- fr
	- gl
	- gu
	- he
	- hi
	- hr
	- ht
	- hu
	- hy
	- id
	- is
	- it
	- ja
	- jv
	- ka
	- kk
	- km
	- kn
	- ko
	- ky
	- lo
	- lt
	- lv
	- mk
	- ml
	- mn
	- mr
	- ms
	- my
	- ne
	- nl
	- pa
	- pl
	- pt
	- qu
	- ro
	- ru
	- si
	- sk
	- sl
	- so
	- sq
	- sr
	- sv
	- sw
	- ta
	- te
	- th
	- tl
	- tr
	- uk
	- ur
	- vi
	- yo
	- zh
	---

	GGUF quants of [Snowflake/snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) created using [llama.cpp](https://github.com/ggerganov/llama.cpp)

	Original model card:
	***

	<h1 align="center">Snowflake's Arctic-embed-l-v2.0</h1>
	<h4 align="center">
	<p>
	<a href=#news>News</a> \|
	<a href=#models>Models</a> \|
	<a href=#usage>Usage</a> \|
	<a href="#evaluation">Evaluation</a> \|
	<a href="#contact">Contact</a> \|
	<a href="#faq">FAQ</a>
	<a href="#license">License</a> \|
	<a href="#acknowledgement">Acknowledgement</a>
	<p>
	</h4>

	<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=18f5b1a3-da66-4f25-92d3-21da829509c3" />

	## News
	- 12/11/2024: Release of [Technical Report](https://arxiv.org/abs/2412.04506)
	- 12/04/2024: Release of [snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) and [snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0) our newest models with multilingual workloads in mind.

	## Models
	Snowflake arctic-embed-l-v2.0 is the newest addition to the suite of embedding models Snowflake has released optimizing for retrieval performance and inference efficiency.
	Arctic Embed 2.0 introduces a new standard for multilingual embedding models, combining high-quality multilingual text retrieval without sacrificing performance in English.
	Released under the permissive Apache 2.0 license, Arctic Embed 2.0 is ideal for applications that demand reliable, enterprise-grade multilingual search and retrieval at scale.

	Key Features:

	1. Multilingual without compromise: Excels in English and non-English retrieval, outperforming leading open-source and proprietary models on benchmarks like MTEB Retrieval, CLEF, and MIRACL.

	2. Inference efficiency: Its 303m non-embedding parameters inference is fast and efficient for any scale.

	3. Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training.

	4. Drop-In Replacement: arctic-embed-l-v2.0 builds on [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which allows direct drop-in inference replacement with any form of new libraries, kernels, inference engines etc.

	5. Long Context Support: arctic-embed-l-v2.0 builds on [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which can support a context window of up to 8192 via the use of RoPE.


	### Quality Benchmarks
	Unlike most other open-source models, Arctic-embed-l-v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF).
	You no longer need to support models to empower high-quality English and multilingual retrieval. All numbers mentioned below are the average NDCG@10 across the dataset being discussed.

	\| Model Name \| # params \| # non-emb params \| # dimensions \| BEIR (15) \| MIRACL (4) \| CLEF (Focused) \| CLEF (Full) \|
	\|---\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|
	\| snowflake-arctic-l-v2.0 \| 568M \| 303M \| 1024 \| 55.6 \| 55.8 \| 52.9 \| 54.3 \|
	\| snowflake-arctic-m \| 109M \| 86M \| 768 \| 54.9 \| 24.9 \| 34.4 \| 29.1 \|
	\| snowflake-arctic-l \| 335M \| 303M \| 1024 \| 56.0 \| 34.8 \| 38.2 \| 33.7 \|
	\| me5 base \| 560M \| 303M \| 1024 \| 51.4 \| 54.0 \| 43.0 \| 34.6 \|
	\| bge-m3 (BAAI) \| 568M \| 303M \| 1024 \| 48.8 \| 56.8 \| 40.8 \| 41.3 \|
	\| gte (Alibaba) \| 305M \| 113M \| 768 \| 51.1 \| 52.3 \| 47.7 \| 53.1 \|

	Aside from high-quality retrieval arctic delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 4x with less than 3% degredation in quality.
	Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc.

	\| Model \| \| BEIR (15) \| Relative Performance \| MIRACL (4) \| Relative Performance \| CLEF (5) \| Relative Performance \| CLEF (Full) \| Relative Performance \|
	\|---\|---\|:---:\|:---:\|:---:\|:---:\|:---:\|---\|---\|---\|
	\| snowflake-arctic-l-v2.0 \| 1024 \| 55.6 \| N/A \| 55.8 \| N/A \| 52.9 \| N/A \| 54.3 \| N/A \|
	\| snowflake-arctic-l-v2.0 \| 256 \| 54.3 \| -0.18% \| 54.3 \| -2.70% \| 51.9 \| -1.81% \| 53.4 \| -1.53% \|

	## Usage

	### Using Sentence Transformers

	```python
	from sentence_transformers import SentenceTransformer

	# Load the model
	model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0'
	model = SentenceTransformer(model_name)

	# Define the queries and documents
	queries = ['what is snowflake?', 'Where can I get the best tacos?']
	documents = ['The Data Cloud!', 'Mexico City of Course!']

	# Compute embeddings: use `prompt_name="query"` to encode queries!
	query_embeddings = model.encode(queries, prompt_name="query")
	document_embeddings = model.encode(documents)

	# Compute cosine similarity scores
	scores = model.similarity(query_embeddings, document_embeddings)

	# Output the results
	for query, query_scores in zip(queries, scores):
	doc_score_pairs = list(zip(documents, query_scores))
	doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
	print("Query:", query)
	for document, score in doc_score_pairs:
	print(score, document)

	```



	### Using Huggingface Transformers


	You can use the transformers package to use Snowflake's arctic-embed model, as shown below. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query).

	```python
	import torch
	from transformers import AutoModel, AutoTokenizer

	model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0'
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModel.from_pretrained(model_name, add_pooling_layer=False)
	model.eval()

	query_prefix = 'query: '
	queries = ['what is snowflake?', 'Where can I get the best tacos?']
	queries_with_prefix = ["{}{}".format(query_prefix, i) for i in queries]
	query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=8192)

	documents = ['The Data Cloud!', 'Mexico City of Course!']
	document_tokens = tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=8192)

	# Compute token embeddings
	with torch.no_grad():
	query_embeddings = model(**query_tokens)[0][:, 0]
	document_embeddings = model(**document_tokens)[0][:, 0]


	# normalize embeddings
	query_embeddings = torch.nn.functional.normalize(query_embeddings, p=2, dim=1)
	document_embeddings = torch.nn.functional.normalize(document_embeddings, p=2, dim=1)

	scores = torch.mm(query_embeddings, document_embeddings.transpose(0, 1))
	for query, query_scores in zip(queries, scores):
	doc_score_pairs = list(zip(documents, query_scores))
	doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
	#Output passages & scores
	print("Query:", query)
	for document, score in doc_score_pairs:
	print(score, document)
	```


	This should produce the following scores

	```
	Query: what is snowflake?
	tensor(0.2715) The Data Cloud!
	tensor(0.0661) Mexico City of Course!
	Query: Where can I get the best tacos?
	tensor(0.2797) Mexico City of Course!
	tensor(0.1250) The Data Cloud!
	```

	### Using Huggingface Transformers.js

	If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
	```bash
	npm i @huggingface/transformers
	```

	You can then use the model for retrieval, as follows:

	```js
	import { pipeline, dot } from '@huggingface/transformers';

	// Create feature extraction pipeline
	const extractor = await pipeline('feature-extraction', 'Snowflake/snowflake-arctic-embed-m-v2.0', {
	dtype: 'q8',
	});

	// Generate sentence embeddings
	const sentences = [
	'query: what is snowflake?',
	'The Data Cloud!',
	'Mexico City of Course!',
	]
	const output = await extractor(sentences, { normalize: true, pooling: 'cls' });

	// Compute similarity scores
	const [source_embeddings, ...document_embeddings ] = output.tolist();
	const similarities = document_embeddings.map(x => dot(source_embeddings, x));
	console.log(similarities); // [0.24783534471401417, 0.05313122704326892]
	```


	## Contact


	Feel free to open an issue or pull request if you have any questions or suggestions about this project.
	You also can email Daniel Campos([email protected]).


	## License
	Arctic is licensed under the [Apache-2](https://www.apache.org/licenses/LICENSE-2.0). The released models can be used for commercial purposes free of charge.