Model points to wrong dataset
Hello There, lately i've been using the SQuAD in Portuguese dataset (link to it: https://drive.google.com/file/d/1Q0IaIlv2h2BC468MwUFmUST0EyN7gNkn/view?usp=sharing ) that this particular model uses, and i noticed that the hub is pointing to a different SQuAD in Portuguese (The one in the image below):
There's a problem in the hub or in the model description. It leads to a mistakenly usage of the wrong data and in the worse it can produce a much worse model (Compared to Pierre's one).
I've trained two models, one with Pierre's data and another with the hub data, and the results pretty far from each other:
Pierre's SQuAD-pt Dataset -> F1 = 82% and EM = 70 %
Hugging Face's SQuAD-pt Dataset -> F1 = 62 % and EM = 51%
You can check the colab notebook and the models in the links below:
Training link = https://colab.research.google.com/drive/1FaUrktnvgKBQa3sI4Tfuyve6iJQceUuE
Validation link = https://colab.research.google.com/drive/1MeFWvLWxGNusOZvCwY9P3GQSbYdBIC1X?usp=sharing
Model trained with Pierre's dataset = https://drive.google.com/drive/folders/108eX1kCYe4BmkEmQLJGoPqqBzN9ktPuA?usp=sharing
Model trained with Hugging Face's dataset = https://drive.google.com/drive/folders/11T_9_zEuiDcJsvOapZF9e8BgyjYA1lqA?usp=sharing
I guess there are two ways of solving this problem:
- Remove the pointing of this model to that dataset (SQuAD_v1_pt)
- Add Pierre's dataset, which is much better than the hub one) into the hub
Just created a repo to upload the Brazilian Portuguese version https://huggingface.co/datasets/ArthurBaia/SQuAD_v1.1_pt-br