antoinelouis
/

colbertv2-camembert-L4-mmarcoFR

@@ -7,75 +7,102 @@ datasets:
 metrics:
 - recall
 tags:
-- sentence-similarity
 - colbert
 base_model: antoinelouis/camembert-L4
 library_name: RAGatouille
 inference: false
 ---
-# 🇫🇷 colbertv2-camembert-L4-mmarcoFR
 This is a lightweight [ColBERTv2](https://doi.org/10.48550/arXiv.2112.01488) model for **French** that can be used for semantic search. It encodes queries and passages into matrices of token-level embeddings and efficiently finds passages that contextually match the query using scalable vector-similarity (MaxSim) operators.
 ## Usage
-Here are some examples for using the model with [colbert-ai](https://github.com/stanford-futuredata/ColBERT) or [RAGatouille](https://github.com/bclavie/RAGatouille).
-### Using ColBERT-AI
 First, you will need to install the following libraries:
 ```bash
-pip install git+https://github.com/stanford-futuredata/ColBERT.git torch faiss-gpu==1.7.2
 ```
 Then, you can use the model like this:
 ```python
-from colbert import Indexer, Searcher
-from colbert.infra import Run, RunConfig
-n_gpu: int = 1 # Set your number of available GPUs
-experiment: str = "colbert" # Name of the folder where the logs and created indices will be stored
 index_name: str = "my_index" # The name of your index, i.e. the name of your vector database
 documents: list = ["Ceci est un premier document.", "Voici un second document.", "etc."] # Corpus
-# Step 1: Indexing. This step encodes all passages into matrices, stores them on disk, and builds data structures for efficient search.
-with Run().context(RunConfig(nranks=n_gpu,experiment=experiment)):
-    indexer = Indexer(checkpoint="antoinelouis/colbertv2-camembert-L4-mmarcoFR")
-    indexer.index(name=index_name, collection=documents)
-# Step 2: Searching. Given the model and index, you can issue queries over the collection to retrieve the top-k passages for each query.
-with Run().context(RunConfig(nranks=n_gpu,experiment=experiment)):
-    searcher = Searcher(index=index_name) # You don't need to specify checkpoint again, the model name is stored in the index.
-    results = searcher.search(query="Comment effectuer une recherche avec ColBERT ?", k=10)
-    # results: tuple of tuples of length k containing ((passage_id, passage_rank, passage_score), ...)
 ```
-### Using RAGatouille
 First, you will need to install the following libraries:
 ```bash
-pip install -U ragatouille
 ```
 Then, you can use the model like this:
 ```python
-from ragatouille import RAGPretrainedModel
 index_name: str = "my_index" # The name of your index, i.e. the name of your vector database
 documents: list = ["Ceci est un premier document.", "Voici un second document.", "etc."] # Corpus
-# Step 1: Indexing.
-RAG = RAGPretrainedModel.from_pretrained("antoinelouis/colbertv2-camembert-L4-mmarcoFR")
-RAG.index(name=index_name, collection=documents)
-# Step 2: Searching.
-RAG = RAGPretrainedModel.from_index(index_name) # if not already loaded
-RAG.search(query="Comment effectuer une recherche avec ColBERT ?", k=10)
 ```
 ***

 metrics:
 - recall
 tags:
 - colbert
+- passage-retrieval
 base_model: antoinelouis/camembert-L4
 library_name: RAGatouille
 inference: false
+model-index:
+- name: colbertv2-camembert-L4-mmarcoFR
+  results:
+    - task:
+        type: sentence-similarity
+        name: Passage Retrieval
+      dataset:
+        type: unicamp-dl/mmarco
+        name: mMARCO-fr
+        config: french
+        split: validation
+      metrics:
+        - type: recall_at_1000
+          name: Recall@1000
+          value: 91.9
+        - type: recall_at_500
+          name: Recall@500
+          value: 90.3
+        - type: recall_at_100
+          name: Recall@100
+          value: 81.9
+        - type: recall_at_10
+          name: Recall@10
+          value: 56.7
+        - type: mrr_at_10
+          name: MRR@10
+          value: 32.3
 ---
+# colbertv2-camembert-L4-mmarcoFR
 This is a lightweight [ColBERTv2](https://doi.org/10.48550/arXiv.2112.01488) model for **French** that can be used for semantic search. It encodes queries and passages into matrices of token-level embeddings and efficiently finds passages that contextually match the query using scalable vector-similarity (MaxSim) operators.
 ## Usage
+Here are some examples for using the model with [RAGatouille](https://github.com/bclavie/RAGatouille) or [colbert-ai](https://github.com/stanford-futuredata/ColBERT).
+### Using RAGatouille
 First, you will need to install the following libraries:
 ```bash
+pip install -U ragatouille
 ```
 Then, you can use the model like this:
 ```python
+from ragatouille import RAGPretrainedModel
 index_name: str = "my_index" # The name of your index, i.e. the name of your vector database
 documents: list = ["Ceci est un premier document.", "Voici un second document.", "etc."] # Corpus
+# Step 1: Indexing.
+RAG = RAGPretrainedModel.from_pretrained("antoinelouis/colbertv2-camembert-L4-mmarcoFR")
+RAG.index(name=index_name, collection=documents)
+# Step 2: Searching.
+RAG = RAGPretrainedModel.from_index(index_name) # if not already loaded
+RAG.search(query="Comment effectuer une recherche avec ColBERT ?", k=10)
 ```
+### Using ColBERT-AI
 First, you will need to install the following libraries:
 ```bash
+pip install git+https://github.com/stanford-futuredata/ColBERT.git torch faiss-gpu==1.7.2
 ```
 Then, you can use the model like this:
 ```python
+from colbert import Indexer, Searcher
+from colbert.infra import Run, RunConfig
+n_gpu: int = 1 # Set your number of available GPUs
+experiment: str = "colbert" # Name of the folder where the logs and created indices will be stored
 index_name: str = "my_index" # The name of your index, i.e. the name of your vector database
 documents: list = ["Ceci est un premier document.", "Voici un second document.", "etc."] # Corpus
+# Step 1: Indexing. This step encodes all passages into matrices, stores them on disk, and builds data structures for efficient search.
+with Run().context(RunConfig(nranks=n_gpu,experiment=experiment)):
+    indexer = Indexer(checkpoint="antoinelouis/colbertv2-camembert-L4-mmarcoFR")
+    indexer.index(name=index_name, collection=documents)
+# Step 2: Searching. Given the model and index, you can issue queries over the collection to retrieve the top-k passages for each query.
+with Run().context(RunConfig(nranks=n_gpu,experiment=experiment)):
+    searcher = Searcher(index=index_name) # You don't need to specify checkpoint again, the model name is stored in the index.
+    results = searcher.search(query="Comment effectuer une recherche avec ColBERT ?", k=10)
+    # results: tuple of tuples of length k containing ((passage_id, passage_rank, passage_score), ...)
 ```
 ***