mrm8488
/

longformer-base-4096-spanish-finetuned-squad

Question Answering

Inference Endpoints

Model card Files Files and versions Community

longformer-base-4096-spanish-finetuned-squad / README.md

mrm8488's picture

Update README.md

09188f0 about 3 years ago

|

1.39 kB

	---
	language: es
	tags:
	- QA
	- Q&A
	datasets:
	- BSC-TeMU/SQAC

	---

	# Spanish Longformer fine-tuned on SQAC for Spanish QA 📖❓
	[longformer-base-4096-spanish](https://huggingface.co/mrm8488/longformer-base-4096-spanish) fine-tuned on [SQAC](https://huggingface.co/datasets/BSC-TeMU/SQAC) for Q&A downstream task.

	## Details of the model 🧠
	[longformer-base-4096-spanish](https://huggingface.co/mrm8488/longformer-base-4096-spanish) is a BERT-like model started from the RoBERTa checkpoint (BERTIN in this case) and pre-trained for MLM on long documents (from BETO's `all_wikis`). It supports sequences of length up to 4,096!

	## Details of the dataset 📚

	This dataset contains 6,247 contexts and 18,817 questions with their answers, 1 to 5 for each fragment.
	The sources of the contexts are:
	* Encyclopedic articles from [Wikipedia in Spanish](https://es.wikipedia.org/), used under [CC-by-sa licence](https://creativecommons.org/licenses/by-sa/3.0/legalcode).
	* News from [Wikinews in Spanish](https://es.wikinews.org/), used under [CC-by licence](https://creativecommons.org/licenses/by/2.5/).
	* Text from the Spanish corpus [AnCora](http://clic.ub.edu/corpus/en), which is a mix from diferent newswire and literature sources, used under [CC-by licence](https://creativecommons.org/licenses/by/4.0/legalcode).
	This dataset can be used to build extractive-QA.