KennethEnevoldsen commited on
Commit
aae1f2d
1 Parent(s): 7e282ae

Max positional embedding causes error when exceeding 512.

Browse files

When I run:
```
from transformers import AutoModelForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("vesteinn/DanskBERT")
model = AutoModelForMaskedLM.from_pretrained("vesteinn/DanskBERT")

text = "very long text "*1000

input_ids = tokenizer(text, return_tensors="pt")
input_ids["input_ids"].shape
# truncate to 512 tokens
input_ids = {k: v[:, :514] for k, v in input_ids.items()}

input_ids["input_ids"].shape

outputs = model.forward(**input_ids)
```


I get:

```
...
2208 # remove once script supports set_grad_enabled
2209 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

IndexError: index out of range in self
```

Files changed (1) hide show
  1. config.json +1 -1
config.json CHANGED
@@ -13,7 +13,7 @@
13
  "initializer_range": 0.02,
14
  "intermediate_size": 3072,
15
  "layer_norm_eps": 1e-05,
16
- "max_position_embeddings": 514,
17
  "model_type": "xlm-roberta",
18
  "num_attention_heads": 12,
19
  "num_hidden_layers": 12,
 
13
  "initializer_range": 0.02,
14
  "intermediate_size": 3072,
15
  "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 512,
17
  "model_type": "xlm-roberta",
18
  "num_attention_heads": 12,
19
  "num_hidden_layers": 12,