Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ tags:
|
|
7 |
- sentence-similarity
|
8 |
- feature-extraction
|
9 |
- generated_from_trainer
|
10 |
-
- dataset_size:
|
11 |
- loss:MatryoshkaLoss
|
12 |
- loss:MultipleNegativesRankingLoss
|
13 |
base_model: bkai-foundation-models/vietnamese-bi-encoder
|
@@ -428,7 +428,7 @@ model-index:
|
|
428 |
LEGAL-EMBEDDING is a Vietnamese text embedding focused on RAG and production efficiency:
|
429 |
|
430 |
📚 **Trained Dataset**:
|
431 |
-
The model was trained on an in-house dataset consisting of approximately **
|
432 |
|
433 |
🪆 **Efficiency**:
|
434 |
Trained with a **Matryoshka loss**, allowing embeddings to be truncated with minimal performance loss. This ensures that smaller embeddings are faster to compare, making the model efficient for real-world production use.
|
|
|
7 |
- sentence-similarity
|
8 |
- feature-extraction
|
9 |
- generated_from_trainer
|
10 |
+
- dataset_size:80000
|
11 |
- loss:MatryoshkaLoss
|
12 |
- loss:MultipleNegativesRankingLoss
|
13 |
base_model: bkai-foundation-models/vietnamese-bi-encoder
|
|
|
428 |
LEGAL-EMBEDDING is a Vietnamese text embedding focused on RAG and production efficiency:
|
429 |
|
430 |
📚 **Trained Dataset**:
|
431 |
+
The model was trained on an in-house dataset consisting of approximately **80,000 examples** of legal questions and their related contexts.
|
432 |
|
433 |
🪆 **Efficiency**:
|
434 |
Trained with a **Matryoshka loss**, allowing embeddings to be truncated with minimal performance loss. This ensures that smaller embeddings are faster to compare, making the model efficient for real-world production use.
|