dadashzadeh
/

2023_10_en_keywords_Cryptocurrency

BM25S

English

bm25

retrieval

lexical

Model card Files Files and versions Community

dadashzadeh commited on Sep 22

Commit

9bd81ec

•

1 Parent(s): ed74736

Update README.md

Browse files

Files changed (1) hide show

README.md +29 -18

README.md CHANGED Viewed

@@ -41,7 +41,7 @@ import bm25s
 from bm25s.hf import BM25HF
 # Load the index
-retriever = BM25HF.load_from_hub("{username}/{repo_name}}")
 # You can retrieve now
 query = "a cat is a feline"
@@ -57,17 +57,28 @@ import bm25s
 from bm25s.hf import BM25HF
 corpus = [
-    "a cat is a feline and likes to purr",
-    "a dog is the human's best friend and loves to play",
-    "a bird is a beautiful animal that can fly",
-    "a fish is a creature that lives in water and swims",
 ]
 retriever = BM25HF(corpus=corpus)
 retriever.index(bm25s.tokenize(corpus))
 token = None  # You can get a token from the Hugging Face website
-retriever.save_to_hub("{username}/{repo_name}", token=token)
 ```
 ## Advanced usage
@@ -76,16 +87,16 @@ You can leverage more advanced features of the BM25S library during `load_from_h
 ```python
 # Load corpus and index in memory-map (mmap=True) to reduce memory
-retriever = BM25HF.load_from_hub("{username}/{repo_name}", load_corpus=True, mmap=True)
 # Load a different branch/revision
-retriever = BM25HF.load_from_hub("{username}/{repo_name}", revision="main")
 # Change directory where the local files should be downloaded
-retriever = BM25HF.load_from_hub("{username}/{repo_name}", local_dir="/path/to/dir")
 # Load private repositories with a token:
-retriever = BM25HF.load_from_hub("{username}/{repo_name}", token=token)
 ```
 ## Stats
@@ -94,9 +105,9 @@ This dataset was created using the following data:
 | Statistic | Value |
 | --- | --- |
-| Number of documents | {num_docs} |
-| Number of tokens | {num_tokens} |
-| Average tokens per document | {avg_tokens_per_doc} |
 ## Parameters
@@ -104,11 +115,11 @@ The index was created with the following parameters:
 | Parameter | Value |
 | --- | --- |
-| k1 | `{k1}` |
-| b | `{b}` |
-| delta | `{delta}` |
-| method | `{method}` |
-| idf method | `{idf_method}` |
 ## Citation

 from bm25s.hf import BM25HF
 # Load the index
+retriever = BM25HF.load_from_hub("dadashzadeh/2023_10_en_keywords_Cryptocurrency")
 # You can retrieve now
 query = "a cat is a feline"
 from bm25s.hf import BM25HF
 corpus = [
+    "northwest bank",
+    "misfits market",
+    "merrick bank login",
+    "marketing",
+    "market place",
+    "jetblue customer service",
+    "internal revenue service",
+    "how to make money online",
+    "gordon food service",
+    "futures market",
+    "frontier airlines customer service",
+    "food banks near me",
+    "first convenience bank",
+    "eastern bank",
+    "dollar bank",
 ]
 retriever = BM25HF(corpus=corpus)
 retriever.index(bm25s.tokenize(corpus))
 token = None  # You can get a token from the Hugging Face website
+retriever.save_to_hub("dadashzadeh/2023_10_en_keywords_Cryptocurrency", token=token)
 ```
 ## Advanced usage
 ```python
 # Load corpus and index in memory-map (mmap=True) to reduce memory
+retriever = BM25HF.load_from_hub("dadashzadeh/2023_10_en_keywords_Cryptocurrency", load_corpus=True, mmap=True)
 # Load a different branch/revision
+retriever = BM25HF.load_from_hub("dadashzadeh/2023_10_en_keywords_Cryptocurrency", revision="main")
 # Change directory where the local files should be downloaded
+retriever = BM25HF.load_from_hub("dadashzadeh/2023_10_en_keywords_Cryptocurrency", local_dir="/path/to/dir")
 # Load private repositories with a token:
+retriever = BM25HF.load_from_hub("dadashzadeh/2023_10_en_keywords_Cryptocurrency", token=token)
 ```
 ## Stats
 | Statistic | Value |
 | --- | --- |
+| Number of documents | 602959 |
+| Number of tokens | 2414020 |
+| Average tokens per document | 4.0 |
 ## Parameters
 | Parameter | Value |
 | --- | --- |
+| k1 | `1.5` |
+| b | `0.75` |
+| delta | `0.5` |
+| method | `lucene` |
+| idf method | `lucene` |
 ## Citation