#NLP #sql

SQL Generation from Natural Language: A Seq2Seq Model – Transformers Architecture


#NLP #sql

Jun 30 , 2021 read

Novelis technical experts have once again achieved a new state-of-the-art in science. Discover our study SQL Generation from Natural Language: A Sequence-to-Sequence Model Powered by the Transformers Architecture and Association Rules, puplished on Journal of Computer Science.

Thanks to the Novelis Research Team for their knowledge and expertise.


Using natural language (NL) to interact with relational databases allows users of any background to easily query and analyze large amounts of data. This requires a system that understands user questions and automatically translates them into structured query languages ​​(such as SQL). The best-performing Text-to-SQL system uses supervised learning (usually expressed as a classification problem) and treats this task as a sketch-based slot filling problem, or first converts the problem into an intermediate logical form (ILF) and then converts it Convert to the corresponding SQL query. However, unsupervised modeling that directly translates the problem into SQL queries has proven to be more difficult. In this sense, we propose a method to directly convert NL questions into SQL statements.

In this research, we propose a sequence-to-sequence (Seq2Seq) parsing model for NL to SQL tasks, supported by a converter architecture that explores two language models (LM): text-to-text transfer converter (T5) ) And multi-language pre-trained text-to-text converter (mT5). In addition, we use transformation-based learning algorithms to update aggregation predictions based on association rules. The resulting model implements a new state-of-the-art technology on the WikiSQL data set for weakly supervised SQL generation.

About the study

“In this study, we treat the Text-to-SQL task with WikiSQL1 (Zhong et al., 2017). This DataSet is the first large-scale dataset for Text-to-SQL, with about 80 K human-annotated pairs of Natural Language question and SQL query. WikiSQL is very challenging because tables and questions are very diverse. This DataSet contains about 24K different tables.

There are two leaderboards for the WikiSQL challenge: Weakly supervised (without using logical form during training) and supervised (with logical form during training). On the supervised challenge, there are two results: Those with Execution Guided (EG) inference and those without EG inference.”

Read the full article

Journal of Computer Science – Volume 17 No. 5, 2021, 480-489 (10 pages)

Journal of Computer Science aims to publish research articles on the theoretical basis of information and computing, and practical technologies for implementation and application in computer systems.

Recent blogs

Interview with Thierry DA SILVA from APICIL Épargne for Novelis

In order to become the French leader in life insurance, APICIL Épargne decided to launch a major p...

Novelis wins Blue Prism 2022 Best AI & Cloud Innovation Solution Award with SmartRoby

During the Partner Forum 2022 organized by Blue Prism on May 24th, Novelis has been awarded for its...

[USE CASES] RPA: tasks with high automation potential in insurance and for mutuals

Insurance and mutual insurance companies are facing new issues and challenges every day. RPA provid...

Review of the Tech for Good tour organized at Novelis

Novelis was pleased to host the Tech for Good Tour in its Parisian offices on April 25 to raise awa...
This site is registered on wpml.org as a development site.