myrkur commited on
Commit
9450719
1 Parent(s): 4dc8e24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -64
README.md CHANGED
@@ -48,7 +48,7 @@ license: apache-2.0
48
 
49
  # SentenceTransformer based on HooshvareLab/bert-base-parsbert-uncased
50
 
51
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [HooshvareLab/bert-base-parsbert-uncased](https://huggingface.co/HooshvareLab/bert-base-parsbert-uncased). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
52
 
53
  ## Model Details
54
 
@@ -99,68 +99,57 @@ similarities = model.similarity(embeddings, embeddings)
99
  print(similarities.shape)
100
  # [3, 3]
101
  ```
 
 
102
 
 
 
103
 
104
- ### Training Logs
105
- | Epoch | Step | Training Loss | loss |
106
- |:----------:|:-------:|:-------------:|:----------:|
107
- | 0.0265 | 20 | 0.7506 | - |
108
- | 0.0530 | 40 | 0.6701 | - |
109
- | 0.0530 | 20 | 0.5843 | - |
110
- | 0.1060 | 40 | 0.4591 | - |
111
- | 0.1591 | 60 | 0.3316 | - |
112
- | 0.2121 | 80 | 0.2856 | - |
113
- | 0.2651 | 100 | 0.2599 | - |
114
- | 0.3181 | 120 | 0.2478 | - |
115
- | 0.3712 | 140 | 0.214 | - |
116
- | 0.4242 | 160 | 0.1996 | - |
117
- | 0.4772 | 180 | 0.1929 | - |
118
- | 0.5302 | 200 | 0.193 | 0.1766 |
119
- | 0.5833 | 220 | 0.1798 | - |
120
- | 0.6363 | 240 | 0.1794 | - |
121
- | 0.6893 | 260 | 0.1735 | - |
122
- | 0.7423 | 280 | 0.1713 | - |
123
- | 0.7954 | 300 | 0.1547 | - |
124
- | 0.8484 | 320 | 0.1545 | - |
125
- | 0.9014 | 340 | 0.1577 | - |
126
- | 0.9544 | 360 | 0.1575 | - |
127
- | 1.0075 | 380 | 0.1431 | - |
128
- | 1.0605 | 400 | 0.1498 | 0.1489 |
129
- | 1.1135 | 420 | 0.1327 | - |
130
- | 1.1665 | 440 | 0.1223 | - |
131
- | 1.2196 | 460 | 0.1154 | - |
132
- | 1.2726 | 480 | 0.1059 | - |
133
- | 1.3256 | 500 | 0.1068 | - |
134
- | 1.3786 | 520 | 0.0959 | - |
135
- | 1.4316 | 540 | 0.0884 | - |
136
- | 1.4847 | 560 | 0.0896 | - |
137
- | 1.5377 | 580 | 0.0899 | - |
138
- | **1.5907** | **600** | **0.0814** | **0.1445** |
139
- | 1.6437 | 620 | 0.0877 | - |
140
- | 1.6968 | 640 | 0.0816 | - |
141
- | 1.7498 | 660 | 0.0846 | - |
142
- | 1.8028 | 680 | 0.0783 | - |
143
- | 1.8558 | 700 | 0.0787 | - |
144
- | 1.9089 | 720 | 0.0874 | - |
145
- | 1.9619 | 740 | 0.0883 | - |
146
-
147
- * The bold row denotes the saved checkpoint.
148
-
149
-
150
- <!--
151
- ## Glossary
152
-
153
- *Clearly define terms in order to be accessible across audiences.*
154
- -->
155
-
156
- <!--
157
- ## Model Card Authors
158
-
159
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
160
- -->
161
-
162
- <!--
163
- ## Model Card Contact
164
-
165
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
166
- -->
 
48
 
49
  # SentenceTransformer based on HooshvareLab/bert-base-parsbert-uncased
50
 
51
+ This [sentence-transformers](https://www.SBERT.net) model is finetuned from [HooshvareLab/bert-base-parsbert-uncased](https://huggingface.co/HooshvareLab/bert-base-parsbert-uncased) with a focus on enhancing Retrieval-Augmented Generation (RAG) systems. It maps sentences and paragraphs to a 768-dimensional dense vector space, making it highly effective for retrieving contextually relevant information to generate accurate and coherent responses in various applications such as QA systems, chatbots, and content generation.
52
 
53
  ## Model Details
54
 
 
99
  print(similarities.shape)
100
  # [3, 3]
101
  ```
102
+ ### Usage in Retrieval-Augmented Generation (RAG) Systems
103
+ Retrieval-Augmented Generation (RAG) systems leverage a combination of retrieval and generation techniques to enhance the quality and accuracy of generated responses. This model can be effectively used to retrieve relevant information from a large corpus, which can then be used to generate more informed and contextually accurate responses. Here's how you can integrate this model into a RAG system:
104
 
105
+ Install Necessary Libraries:
106
+ Ensure you have the required libraries:
107
 
108
+ ```bash
109
+ pip install -U sentence-transformers transformers
110
+ ```
111
+
112
+ ```python
113
+ from sentence_transformers import SentenceTransformer, util
114
+ import torch
115
+
116
+ # Load the model
117
+ model = SentenceTransformer("myrkur/sentence-transformer-parsbert-fa")
118
+
119
+ # Example corpus
120
+ corpus = [
121
+ 'پرتغالی، در وطن اصلی خود، پرتغال، تقریباً توسط ۱۰ میلیون نفر جمعیت صحبت می‌شود...',
122
+ 'اشکانیان حدود دو قرن بر ایران حکومت کردند...',
123
+ 'عباس جدیدی، کشتی‌گیر سابق ایرانی است...',
124
+ # ... (more documents)
125
+ ]
126
+
127
+ # Encode the corpus
128
+ corpus_embeddings = model.encode(corpus, convert_to_tensor=True)
129
+ ```
130
+
131
+ Retrieve Relevant Information:
132
+ Given a user query, retrieve the most relevant documents from the corpus:
133
+
134
+ ```python
135
+ # User query
136
+ query = "عباس جدیدی که بود؟"
137
+ query_embedding = model.encode(query, convert_to_tensor=True)
138
+
139
+ # Retrieve the top-k most similar documents
140
+ top_k = 5
141
+ hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=top_k)
142
+ hits = hits[0]
143
+
144
+ # Print the retrieved documents
145
+ for hit in hits:
146
+ print(f"Score: {hit['score']:.4f}")
147
+ print(corpus[hit['corpus_id']])
148
+ ```
149
+ ## Conclusion
150
+ This sentence-transformer model is a powerful tool for various NLP applications, particularly in retrieval-augmented generation systems, enabling more accurate and contextually relevant information retrieval and generation.
151
+
152
+ ## Contact
153
+ For questions or further information, please contact:
154
+
155
+ - Amir Masoud Ahmadi: [[email protected]](mailto:[email protected])