DmitryPogrebnoy commited on
Commit
6227ce2
·
1 Parent(s): 038888d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md CHANGED
@@ -1,3 +1,58 @@
1
  ---
2
  license: apache-2.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: fill-mask
4
  ---
5
+
6
+ # Model MedMDebertaV3
7
+
8
+ # Model Description
9
+
10
+ This model is fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base/tree/main).
11
+ The code for the fine-tuned process can be
12
+ found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/spellchecker/ml_ranging/models/med_mdeberta/fine_tune_mdebert_colab.ipynb)
13
+ .
14
+ The model is fine-tuned on a specially collected dataset of over 30,000 medical anamneses in Russian.
15
+ The collected dataset can be
16
+ found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/data/anamnesis/processed/all_anamnesis.csv).
17
+
18
+ This model was created as part of a master's project to develop a method for correcting typos
19
+ in medical histories using BERT models as a ranking of candidates.
20
+ The project is open source and can be found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker).
21
+
22
+ # How to Get Started With the Model
23
+
24
+ You can use the model directly with a pipeline for masked language modeling:
25
+
26
+ ```python
27
+ >> > from transformers import pipeline
28
+ >> > pipeline = pipeline('fill-mask', model='DmitryPogrebnoy/MedMDebertaV3')
29
+ >> > pipeline("У пациента [MASK] боль в грудине.")
30
+ [{'score': 0.05280596762895584,
31
+ 'token': 4595,
32
+ 'token_str': 'суд',
33
+ 'sequence': 'У пациента суд боль в грудине.'},
34
+ {'score': 0.050577640533447266,
35
+ 'token': 19157,
36
+ 'token_str': 'времени',
37
+ 'sequence': 'У пациента времени боль в грудине.'},
38
+ {'score': 0.02754475176334381,
39
+ 'token': 19174,
40
+ 'token_str': 'препарат',
41
+ 'sequence': 'У пациента препарат боль в грудине.'},
42
+ {'score': 0.027341477572917938,
43
+ 'token': 125009,
44
+ 'token_str': 'рошен',
45
+ 'sequence': 'У пациентарошен боль в грудине.'},
46
+ {'score': 0.022251157090067863,
47
+ 'token': 19441,
48
+ 'token_str': 'енный',
49
+ 'sequence': 'У пациентаенный боль в грудине.'}]
50
+ ```
51
+
52
+ Or you can load the model and tokenizer and do what you need to do:
53
+
54
+ ```python
55
+ >> > from transformers import AutoTokenizer, AutoModelForMaskedLM
56
+ >> > tokenizer = AutoTokenizer.from_pretrained("DmitryPogrebnoy/MedMDebertaV3")
57
+ >> > model = AutoModelForMaskedLM.from_pretrained("DmitryPogrebnoy/MedMDebertaV3")
58
+ ```