Papers
arxiv:2110.03546

mRAT-SQL+GAP:A Portuguese Text-to-SQL Transformer

Published on Oct 7, 2021
Authors:

Abstract

The translation of natural language questions to SQL queries has attracted growing attention, in particular in connection with transformers and similar language models. A large number of techniques are geared towards the English language; in this work, we thus investigated translation to SQL when input questions are given in the Portuguese language. To do so, we properly adapted state-of-the-art tools and resources. We changed the RAT-SQL+GAP system by relying on a <PRE_TAG>multilingual BART model</POST_TAG> (we report tests with other language models), and we produced a translated version of the Spider dataset. Our experiments expose interesting phenomena that arise when non-English languages are targeted; in particular, it is better to train with original and translated training datasets together, even if a single target language is desired. This <PRE_TAG>multilingual BART model</POST_TAG> fine-tuned with a double-size training dataset (English and Portuguese) achieved 83% of the baseline, making inferences for the Portuguese test dataset. This investigation can help other researchers to produce results in Machine Learning in a language different from English. Our multilingual ready version of RAT-SQL+GAP and the data are available, open-sourced as mRAT-SQL+GAP at: https://github.com/C4AI/gap-text2sql

Community

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 19

Browse 19 datasets citing this paper

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2110.03546 in a Space README.md to link it from this page.

Collections including this paper 1