w11wo commited on
Commit
62f5d78
·
verified ·
1 Parent(s): bb2173a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -18,9 +18,11 @@ datasets:
18
  - indonesian-nlp/lfqa_id
19
  - jakartaresearch/indoqa
20
  - jakartaresearch/id-paraphrase-detection
 
 
21
  ---
22
 
23
- # LazarusNLP/all-multilingual-e5-small
24
 
25
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
26
 
@@ -40,7 +42,7 @@ Then you can use the model like this:
40
  from sentence_transformers import SentenceTransformer
41
  sentences = ["This is an example sentence", "Each sentence is converted"]
42
 
43
- model = SentenceTransformer('LazarusNLP/all-multilingual-e5-small')
44
  embeddings = model.encode(sentences)
45
  print(embeddings)
46
  ```
@@ -66,8 +68,8 @@ def mean_pooling(model_output, attention_mask):
66
  sentences = ['This is an example sentence', 'Each sentence is converted']
67
 
68
  # Load model from HuggingFace Hub
69
- tokenizer = AutoTokenizer.from_pretrained('LazarusNLP/all-multilingual-e5-small')
70
- model = AutoModel.from_pretrained('LazarusNLP/all-multilingual-e5-small')
71
 
72
  # Tokenize sentences
73
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
@@ -89,7 +91,7 @@ print(sentence_embeddings)
89
 
90
  <!--- Describe how your model was evaluated -->
91
 
92
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=LazarusNLP/all-multilingual-e5-small)
93
 
94
 
95
  ## Training
@@ -99,7 +101,7 @@ The model was trained with the parameters:
99
 
100
  `MultiDatasetDataLoader.MultiDatasetDataLoader` of length 968 with parameters:
101
  ```
102
- {'batch_size': 'unknown'}
103
  ```
104
 
105
  **Loss**:
 
18
  - indonesian-nlp/lfqa_id
19
  - jakartaresearch/indoqa
20
  - jakartaresearch/id-paraphrase-detection
21
+ language:
22
+ - ind
23
  ---
24
 
25
+ # LazarusNLP/all-indo-e5-small-v2
26
 
27
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
28
 
 
42
  from sentence_transformers import SentenceTransformer
43
  sentences = ["This is an example sentence", "Each sentence is converted"]
44
 
45
+ model = SentenceTransformer('LazarusNLP/all-indo-e5-small-v2')
46
  embeddings = model.encode(sentences)
47
  print(embeddings)
48
  ```
 
68
  sentences = ['This is an example sentence', 'Each sentence is converted']
69
 
70
  # Load model from HuggingFace Hub
71
+ tokenizer = AutoTokenizer.from_pretrained('LazarusNLP/all-indo-e5-small-v2')
72
+ model = AutoModel.from_pretrained('LazarusNLP/all-indo-e5-small-v2')
73
 
74
  # Tokenize sentences
75
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 
91
 
92
  <!--- Describe how your model was evaluated -->
93
 
94
+ For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=LazarusNLP/all-indo-e5-small-v2)
95
 
96
 
97
  ## Training
 
101
 
102
  `MultiDatasetDataLoader.MultiDatasetDataLoader` of length 968 with parameters:
103
  ```
104
+ {'batch_size_pairs': 384, 'batch_size_triplets': 256}
105
  ```
106
 
107
  **Loss**: