metadata
language: es
tags:
- QA
- Q&A
datasets:
- BSC-TeMU/SQAC
Spanish Longformer fine-tuned on SQAC for Spanish QA 📖❓
longformer-base-4096-spanish fine-tuned on SQAC for Q&A downstream task.
Details of the model 🧠
longformer-base-4096-spanish is a BERT-like model started from the RoBERTa checkpoint (BERTIN in this case) and pre-trained for MLM on long documents (from BETO's all_wikis
). It supports sequences of length up to 4,096!
Details of the dataset 📚
This dataset contains 6,247 contexts and 18,817 questions with their answers, 1 to 5 for each fragment. The sources of the contexts are:
- Encyclopedic articles from Wikipedia in Spanish, used under CC-by-sa licence.
- News from Wikinews in Spanish, used under CC-by licence.
- Text from the Spanish corpus AnCora, which is a mix from diferent newswire and literature sources, used under CC-by licence. This dataset can be used to build extractive-QA.