#NLP #sql

SQL Generation from Natural Language: A Seq2Seq Model – Transformers Architecture

30/06/2021

#NLP #sql

Jun 30 , 2021 read

Novelis technical experts have once again achieved a new state-of-the-art in science. Discover our study SQL Generation from Natural Language: A Sequence-to-Sequence Model Powered by the Transformers Architecture and Association Rules, puplished on Journal of Computer Science.

Thanks to the Novelis Research Team for their knowledge and expertise.

Abstract

Using natural language (NL) to interact with relational databases allows users of any background to easily query and analyze large amounts of data. This requires a system that understands user questions and automatically translates them into structured query languages ​​(such as SQL). The best-performing Text-to-SQL system uses supervised learning (usually expressed as a classification problem) and treats this task as a sketch-based slot filling problem, or first converts the problem into an intermediate logical form (ILF) and then converts it Convert to the corresponding SQL query. However, unsupervised modeling that directly translates the problem into SQL queries has proven to be more difficult. In this sense, we propose a method to directly convert NL questions into SQL statements.

In this research, we propose a sequence-to-sequence (Seq2Seq) parsing model for NL to SQL tasks, supported by a converter architecture that explores two language models (LM): text-to-text transfer converter (T5) ) And multi-language pre-trained text-to-text converter (mT5). In addition, we use transformation-based learning algorithms to update aggregation predictions based on association rules. The resulting model implements a new state-of-the-art technology on the WikiSQL data set for weakly supervised SQL generation.

About the study

“In this study, we treat the Text-to-SQL task with WikiSQL1 (Zhong et al., 2017). This DataSet is the first large-scale dataset for Text-to-SQL, with about 80 K human-annotated pairs of Natural Language question and SQL query. WikiSQL is very challenging because tables and questions are very diverse. This DataSet contains about 24K different tables.

There are two leaderboards for the WikiSQL challenge: Weakly supervised (without using logical form during training) and supervised (with logical form during training). On the supervised challenge, there are two results: Those with Execution Guided (EG) inference and those without EG inference.”

Read the full article

Journal of Computer Science – Volume 17 No. 5, 2021, 480-489 (10 pages)

Journal of Computer Science aims to publish research articles on the theoretical basis of information and computing, and practical technologies for implementation and application in computer systems.

Recent blogs

Decipher IDP: Moving from automation 1.0 to intelligent automation 

In this article we will talk about the Decipher IDP tool from SS&C Blue Prism. We will first se...

PFE 2023 Internship Recruitment Campaign - Morocco

Novelis is looking for candidates that are passionate about Tech, who share the same sense of commi...

Pink Innov’: Interview Mehdi Nafe “Sustainable innovation, a decade to act”

Pink Innov' is a network of women and men who want to bring innovation to life by facilitating its ...

2nd edition of the Novelis Symposium - Weekend on the island of Lopud, Croatia 

For the second consecutive year, the collaborators of the Paris office as well as our partners and ...
This site is registered on wpml.org as a development site.