KennethEnevoldsen
commited on
Commit
•
aae1f2d
1
Parent(s):
7e282ae
Max positional embedding causes error when exceeding 512.
Browse filesWhen I run:
```
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("vesteinn/DanskBERT")
model = AutoModelForMaskedLM.from_pretrained("vesteinn/DanskBERT")
text = "very long text "*1000
input_ids = tokenizer(text, return_tensors="pt")
input_ids["input_ids"].shape
# truncate to 512 tokens
input_ids = {k: v[:, :514] for k, v in input_ids.items()}
input_ids["input_ids"].shape
outputs = model.forward(**input_ids)
```
I get:
```
...
2208 # remove once script supports set_grad_enabled
2209 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
```
- config.json +1 -1
config.json
CHANGED
@@ -13,7 +13,7 @@
|
|
13 |
"initializer_range": 0.02,
|
14 |
"intermediate_size": 3072,
|
15 |
"layer_norm_eps": 1e-05,
|
16 |
-
"max_position_embeddings":
|
17 |
"model_type": "xlm-roberta",
|
18 |
"num_attention_heads": 12,
|
19 |
"num_hidden_layers": 12,
|
|
|
13 |
"initializer_range": 0.02,
|
14 |
"intermediate_size": 3072,
|
15 |
"layer_norm_eps": 1e-05,
|
16 |
+
"max_position_embeddings": 512,
|
17 |
"model_type": "xlm-roberta",
|
18 |
"num_attention_heads": 12,
|
19 |
"num_hidden_layers": 12,
|