ceyhunemreozturk
commited on
A simple code block that shows the usage of the model was added.
Browse files
README.md
CHANGED
@@ -13,6 +13,21 @@ We introduce BERTurk-Legal which is a transformer-based language model to retrie
|
|
13 |
|
14 |
Test dataset can be accessed from the following link: https://github.com/koc-lab/yargitay_retrieval_dataset
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
## Citation
|
17 |
If you use the model, please cite the following conference paper.
|
18 |
```
|
|
|
13 |
|
14 |
Test dataset can be accessed from the following link: https://github.com/koc-lab/yargitay_retrieval_dataset
|
15 |
|
16 |
+
The model can be loaded and used to create document embeddings as follows. Then, the document embeddings can be utilized for retrieval.
|
17 |
+
```
|
18 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
19 |
+
|
20 |
+
bert_model = "KocLab-Bilkent/BERTurk-Legal"
|
21 |
+
|
22 |
+
model = AutoModelForSequenceClassification.from_pretrained(bert_model, output_hidden_states=True)
|
23 |
+
tokenizer = AutoTokenizer.from_pretrained(bert_model)
|
24 |
+
|
25 |
+
tokens = tokenizer("Örnek metin") # a dummy text is provided as input
|
26 |
+
|
27 |
+
output = model(tokens)
|
28 |
+
docEmbeddings = output.hidden_states[-1]
|
29 |
+
```
|
30 |
+
|
31 |
## Citation
|
32 |
If you use the model, please cite the following conference paper.
|
33 |
```
|