metadata

language: es
tags:
  - QA
  - Q&A
datasets:
  - BSC-TeMU/SQAC

Spanish Longformer fine-tuned on SQAC for Spanish QA 📖❓

longformer-base-4096-spanish fine-tuned on SQAC for Q&A downstream task.

Details of the model 🧠

longformer-base-4096-spanish is a BERT-like model started from the RoBERTa checkpoint (BERTIN in this case) and pre-trained for MLM on long documents (from BETO's all_wikis). It supports sequences of length up to 4,096!

Details of the dataset 📚

This dataset contains 6,247 contexts and 18,817 questions with their answers, 1 to 5 for each fragment. The sources of the contexts are:

Encyclopedic articles from Wikipedia in Spanish, used under CC-by-sa licence.
News from Wikinews in Spanish, used under CC-by licence.
Text from the Spanish corpus AnCora, which is a mix from diferent newswire and literature sources, used under CC-by licence. This dataset can be used to build extractive-QA.