mRAT-SQL+GAP: A Portuguese Text-to-SQL Transformer

Marcelo Archanjo José, Fabio Gagliardi Cozman

Resultado de la investigación: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

The translation of natural language questions to SQL queries has attracted growing attention, in particular in connection with transformers and similar language models. A large number of techniques are geared towards the English language; in this work, we thus investigated translation to SQL when input questions are given in the Portuguese language. To do so, we properly adapted state-of-the-art tools and resources. We changed the RAT-SQL+GAP system by relying on a multilingual BART model (we report tests with other language models), and we produced a translated version of the Spider dataset. Our experiments expose interesting phenomena that arise when non-English languages are targeted; in particular, it is better to train with original and translated training datasets together, even if a single target language is desired. This multilingual BART model fine-tuned with a double-size training dataset (English and Portuguese) achieved 83% of the baseline, making inferences for the Portuguese test dataset. This investigation can help other researchers to produce results in Machine Learning in a language different from English. Our multilingual ready version of RAT-SQL+GAP and the data are available, open-sourced as mRAT-SQL+GAP at: https://github.com/C4AI/gap-text2sql.

Idioma originalInglés
Título de la publicación alojadaIntelligent Systems - 10th Brazilian Conference, BRACIS 2021, Proceedings, Part 2
EditoresAndré Britto, Karina Valdivia Delgado
EditorialSpringer Science and Business Media Deutschland GmbH
Páginas511-525
Número de páginas15
ISBN (versión impresa)9783030916985
DOI
EstadoPublicada - 2021
Publicado de forma externa
Evento10th Brazilian Conference on Intelligent Systems, BRACIS 2021 - Virtual, Online
Duración: 29 nov 20213 dic 2021

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen13074 LNAI
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia10th Brazilian Conference on Intelligent Systems, BRACIS 2021
CiudadVirtual, Online
Período29/11/213/12/21

Huella

Profundice en los temas de investigación de 'mRAT-SQL+GAP: A Portuguese Text-to-SQL Transformer'. En conjunto forman una huella única.

Citar esto