luisespinosa
update readme
8b48c2c
|
raw
history blame
1.25 kB
# Twitter-roBERTa-base
This is a roBERTa-base model trained on ~58M tweets, described and evaluated in the [_TweetEval_ benchmark (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf). To evaluate this and other LMs on Twitter-specific data, please refer to the [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval).
## Ejemplo MLM
```python
from transformers import pipeline, AutoTokenizer
import numpy as np
MODEL = "cardiffnlp/roberta-base-rt"
fill_mask = pipeline("fill-mask", model=MODEL, tokenizer=MODEL)
tokenizer = AutoTokenizer.from_pretrained(MODEL)
def print_candidates():
for i in range(5):
token = tokenizer.decode(candidates[i]['token'])
score = np.round(candidates[i]['score'], 4)
print(f"{i+1}) {token} {score}")
texts = [
"I am so <mask> 😊",
"I am so <mask> 😒"
]
for text in texts:
print(f"{'-'*30}\n{text}")
candidates = fill_mask(text)
print_candidates()
```
```
------------------------------
I am so <mask> 😊
1) happy 0.402
2) excited 0.1441
3) proud 0.143
4) grateful 0.0669
5) blessed 0.0334
------------------------------
I am so <mask> 😒
1) sad 0.2641
2) sorry 0.1605
3) tired 0.138
4) sick 0.0278
5) hungry 0.0232
```