quanghuy123 commited on
Commit
fbbcd0a
·
verified ·
1 Parent(s): a8e3e9e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - sentence-similarity
8
  - feature-extraction
9
  - generated_from_trainer
10
- - dataset_size:50000
11
  - loss:MatryoshkaLoss
12
  - loss:MultipleNegativesRankingLoss
13
  base_model: bkai-foundation-models/vietnamese-bi-encoder
@@ -428,7 +428,7 @@ model-index:
428
  LEGAL-EMBEDDING is a Vietnamese text embedding focused on RAG and production efficiency:
429
 
430
  📚 **Trained Dataset**:
431
- The model was trained on an in-house dataset consisting of approximately **50,000 examples** of legal questions and their related contexts.
432
 
433
  🪆 **Efficiency**:
434
  Trained with a **Matryoshka loss**, allowing embeddings to be truncated with minimal performance loss. This ensures that smaller embeddings are faster to compare, making the model efficient for real-world production use.
 
7
  - sentence-similarity
8
  - feature-extraction
9
  - generated_from_trainer
10
+ - dataset_size:80000
11
  - loss:MatryoshkaLoss
12
  - loss:MultipleNegativesRankingLoss
13
  base_model: bkai-foundation-models/vietnamese-bi-encoder
 
428
  LEGAL-EMBEDDING is a Vietnamese text embedding focused on RAG and production efficiency:
429
 
430
  📚 **Trained Dataset**:
431
+ The model was trained on an in-house dataset consisting of approximately **80,000 examples** of legal questions and their related contexts.
432
 
433
  🪆 **Efficiency**:
434
  Trained with a **Matryoshka loss**, allowing embeddings to be truncated with minimal performance loss. This ensures that smaller embeddings are faster to compare, making the model efficient for real-world production use.