pierreguillou/bert-base-cased-squad-v1.1-portuguese

Hello There, lately i've been using the SQuAD in Portuguese dataset (link to it: https://drive.google.com/file/d/1Q0IaIlv2h2BC468MwUFmUST0EyN7gNkn/view?usp=sharing ) that this particular model uses, and i noticed that the hub is pointing to a different SQuAD in Portuguese (The one in the image below):

There's a problem in the hub or in the model description. It leads to a mistakenly usage of the wrong data and in the worse it can produce a much worse model (Compared to Pierre's one).
I've trained two models, one with Pierre's data and another with the hub data, and the results pretty far from each other:

Pierre's SQuAD-pt Dataset -> F1 = 82% and EM = 70 %
Hugging Face's SQuAD-pt Dataset -> F1 = 62 % and EM = 51%

You can check the colab notebook and the models in the links below:
Training link = https://colab.research.google.com/drive/1FaUrktnvgKBQa3sI4Tfuyve6iJQceUuE
Validation link = https://colab.research.google.com/drive/1MeFWvLWxGNusOZvCwY9P3GQSbYdBIC1X?usp=sharing
Model trained with Pierre's dataset = https://drive.google.com/drive/folders/108eX1kCYe4BmkEmQLJGoPqqBzN9ktPuA?usp=sharing
Model trained with Hugging Face's dataset = https://drive.google.com/drive/folders/11T_9_zEuiDcJsvOapZF9e8BgyjYA1lqA?usp=sharing

I guess there are two ways of solving this problem:

Remove the pointing of this model to that dataset (SQuAD_v1_pt)
Add Pierre's dataset, which is much better than the hub one) into the hub

pierreguillou
/

bert-base-cased-squad-v1.1-portuguese

Model points to wrong dataset