quanghuy123
/

LEGAL_EMBEDDING

Sentence Similarity

sentence-transformers

feature-extraction

Generated from Trainer

dataset_size:80000

loss:MatryoshkaLoss

loss:MultipleNegativesRankingLoss

Inference Endpoints

Model card Files Files and versions Community

quanghuy123 commited on 24 days ago

Commit

fbbcd0a

·

verified ·

1 Parent(s): a8e3e9e

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
-- dataset_size:50000
 - loss:MatryoshkaLoss
 - loss:MultipleNegativesRankingLoss
 base_model: bkai-foundation-models/vietnamese-bi-encoder
@@ -428,7 +428,7 @@ model-index:
 LEGAL-EMBEDDING is a Vietnamese text embedding  focused on RAG and production efficiency:
 📚 **Trained Dataset**:
-The model was trained on an in-house dataset consisting of approximately **50,000 examples** of legal questions and their related contexts.
 🪆 **Efficiency**:
 Trained with a **Matryoshka loss**, allowing embeddings to be truncated with minimal performance loss. This ensures that smaller embeddings are faster to compare, making the model efficient for real-world production use.

 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
+- dataset_size:80000
 - loss:MatryoshkaLoss
 - loss:MultipleNegativesRankingLoss
 base_model: bkai-foundation-models/vietnamese-bi-encoder
 LEGAL-EMBEDDING is a Vietnamese text embedding  focused on RAG and production efficiency:
 📚 **Trained Dataset**:
+The model was trained on an in-house dataset consisting of approximately **80,000 examples** of legal questions and their related contexts.
 🪆 **Efficiency**:
 Trained with a **Matryoshka loss**, allowing embeddings to be truncated with minimal performance loss. This ensures that smaller embeddings are faster to compare, making the model efficient for real-world production use.