Prompting-based Synthetic Data Generation for Few-Shot Question Answering
Abstract
Although language models (LMs) have boosted the performance of Question Answering, they still need plenty of data. Data annotation, in contrast, is a time-consuming process. This especially applies to Question Answering, where possibly large documents have to be parsed and annotated with questions and their corresponding answers. Furthermore, Question Answering models often only work well for the domain they were trained on. Since annotation is costly, we argue that domain-agnostic knowledge from LMs, such as linguistic understanding, is sufficient to create a well-curated dataset. With this motivation, we show that using large language models can improve Question Answering performance on various datasets in the few-shot setting compared to state-of-the-art approaches. For this, we perform data generation leveraging the Prompting framework, suggesting that language models contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme. As a result, we consistently outperform previous approaches on few-shot Question Answering.
Community
@librarian-bot recommend
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning (2024)
- Enhancing Pre-Trained Generative Language Models with Question Attended Span Extraction on Machine Reading Comprehension (2024)
- Asking and Answering Questions to Extract Event-Argument Structures (2024)
- Better Synthetic Data by Retrieving and Transforming Existing Datasets (2024)
- Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper