ZeroShotBioNER / README.md

Ok verovatno finalni pull request (#6)

0ca81cc over 1 year ago

3.78 kB

	---
	license: mit
	datasets:
	- bigbio/chemdner
	- ncbi_disease
	- jnlpba
	- bigbio/n2c2_2018_track2
	- bigbio/bc5cdr
	widget:
	- text: Drug<SEP>He was given aspirin and paracetamol.
	language:
	- en
	metrics:
	- precision
	- recall
	- f1
	pipeline_tag: token-classification
	tags:
	- token-classification
	- biology
	- medical
	- zero-shot
	- few-shot
	library_name: transformers
	---
	# Zero and few shot NER for biomedical texts

	## Model description

	This model was created during the research collaboration between Bayer Pharma and Serbian Institute for Artificial Intelligence Research and Development.
	The model is trained on about 25+ biomedical NER classes and can perform also zero-shot inference and can be further fine-tuned for new classes with just few examples (few-shot learning).
	For more details about our methods please see the paper named ["A transformer-based method for zero and few-shot biomedical named entity recognition"](https://arxiv.org/abs/2305.04928).

	Model takes as input two strings. String1 is NER label that is being searched in second string. String1 must be phrase for entity. String2 is short text where String1 is searched for semantically.
	model outputs list of zeros and ones corresponding to the occurance of Named Entity and corresponing to the tokens(tokens given by transformer tokenizer) of the Sring2.

	## Example of usage
	```python
	from transformers import AutoTokenizer
	from transformers import BertForTokenClassification

	modelname = 'ProdicusII/ZeroShotBioNER' # modelpath
	tokenizer = AutoTokenizer.from_pretrained(modelname) ## loading the tokenizer of that model
	string1 = 'Drug'
	string2 = 'No recent antibiotics or other nephrotoxins, and no symptoms of UTI with benign UA.'
	encodings = tokenizer(string1, string2, is_split_into_words=False,
	padding=True, truncation=True, add_special_tokens=True, return_offsets_mapping=False,
	max_length=512, return_tensors='pt')

	model = BertForTokenClassification.from_pretrained(modelname, num_labels=2)
	prediction_logits = model(**encodings)
	print(prediction_logits)
	```

	## Available classes

	The following datasets and entities were used for training and therefore they can be used as label in the first segment (as a first string). Note that multiword string have been merged.


	* NCBI
	* Specific Disease
	* Composite Mention
	* Modifier
	* Disease Class
	* BIORED
	* Sequence Variant
	* Gene Or Gene Product
	* Disease Or Phenotypic Feature
	* Chemical Entity
	* Cell Line
	* Organism Taxon
	* CDR Disease
	* Chemical
	* CHEMDNER
	* Chemical
	* Chemical Family
	* JNLPBA
	* Protein
	* DNA
	* Cell Type
	* Cell Line
	* RNA
	* n2c2
	* Drug
	* Frequency
	* Strength
	* Dosage
	* Form
	* Reason
	* Route
	* ADE
	* Duration

	On top of this, one can use the model in zero-shot regime with other classes, and also fine-tune it with few examples of other classes.



	## Code availibility

	Code used for training and testing the model is available at https://github.com/br-ai-ns-institute/Zero-ShotNER

	## Citation

	If you use this model, or are inspired by it, please cite in your paper the following paper:

	Košprdić M.,Prodanović N., Ljajić A., Bašaragin B., Milošević N., 2023. A transformer-based method for zero and few-shot biomedical named entity recognition. arXiv preprint arXiv:2305.04928. https://arxiv.org/abs/2305.04928

	or in bibtex:
	```
	@misc{kosprdic2023transformerbased,
	title={A transformer-based method for zero and few-shot biomedical named entity recognition},
	author={Miloš Košprdić and Nikola Prodanović and Adela Ljajić and Bojana Bašaragin and Nikola Milošević},
	year={2023},
	eprint={2305.04928},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```