cardiffnlp
/

twitter-roberta-base

Inference Endpoints

Model card Files Files and versions Community

twitter-roberta-base / README.md

luisespinosa

update readme

8b48c2c over 4 years ago

|

1.25 kB

	# Twitter-roBERTa-base

	This is a roBERTa-base model trained on ~58M tweets, described and evaluated in the [_TweetEval_ benchmark (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf). To evaluate this and other LMs on Twitter-specific data, please refer to the [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval).


	## Ejemplo MLM

	```python
	from transformers import pipeline, AutoTokenizer
	import numpy as np

	MODEL = "cardiffnlp/roberta-base-rt"
	fill_mask = pipeline("fill-mask", model=MODEL, tokenizer=MODEL)
	tokenizer = AutoTokenizer.from_pretrained(MODEL)

	def print_candidates():
	for i in range(5):
	token = tokenizer.decode(candidates[i]['token'])
	score = np.round(candidates[i]['score'], 4)
	print(f"{i+1}) {token} {score}")

	texts = [
	"I am so <mask> 😊",
	"I am so <mask> 😢"
	]
	for text in texts:
	print(f"{'-'*30}\n{text}")
	candidates = fill_mask(text)
	print_candidates()
	```

	```
	------------------------------
	I am so <mask> 😊
	1) happy 0.402
	2) excited 0.1441
	3) proud 0.143
	4) grateful 0.0669
	5) blessed 0.0334
	------------------------------
	I am so <mask> 😢
	1) sad 0.2641
	2) sorry 0.1605
	3) tired 0.138
	4) sick 0.0278
	5) hungry 0.0232
	```