SQL Generation from Natural Language: A Seq2Seq Model – Transformers Architecture

Novelis technical experts have once again achieved a new state-of-the-art in science. Discover our study SQL Generation from Natural Language: A Sequence-to-Sequence Model Powered by the Transformers Architecture and Association Rules, puplished on Journal of Computer Science.

Thanks to the Novelis Research Team for their knowledge and expertise.


Using natural language (NL) to interact with relational databases allows users of any background to easily query and analyze large amounts of data. This requires a system that understands user questions and automatically translates them into structured query languages ​​(such as SQL). The best-performing Text-to-SQL system uses supervised learning (usually expressed as a classification problem) and treats this task as a sketch-based slot filling problem, or first converts the problem into an intermediate logical form (ILF) and then converts it Convert to the corresponding SQL query. However, unsupervised modeling that directly translates the problem into SQL queries has proven to be more difficult. In this sense, we propose a method to directly convert NL questions into SQL statements.

In this research, we propose a sequence-to-sequence (Seq2Seq) parsing model for NL to SQL tasks, supported by a converter architecture that explores two language models (LM): text-to-text transfer converter (T5) ) And multi-language pre-trained text-to-text converter (mT5). In addition, we use transformation-based learning algorithms to update aggregation predictions based on association rules. The resulting model implements a new state-of-the-art technology on the WikiSQL data set for weakly supervised SQL generation.

About the study

“In this study, we treat the Text-to-SQL task with WikiSQL1 (Zhong et al., 2017). This DataSet is the first large-scale dataset for Text-to-SQL, with about 80 K human-annotated pairs of Natural Language question and SQL query. WikiSQL is very challenging because tables and questions are very diverse. This DataSet contains about 24K different tables.

There are two leaderboards for the WikiSQL challenge: Weakly supervised (without using logical form during training) and supervised (with logical form during training). On the supervised challenge, there are two results: Those with Execution Guided (EG) inference and those without EG inference.”

Read the full article

Journal of Computer Science – Volume 17 No. 5, 2021, 480-489 (10 pages)

Journal of Computer Science aims to publish research articles on the theoretical basis of information and computing, and practical technologies for implementation and application in computer systems.

Novelis ranks 2nd in international NLP Research Challenge

One more step towards the democratization of Artificial Intelligence and NLP (Natural Language Processing), Challenge SPIDER

Paris, March 25, 2021 – Novelis, an innovative consulting and technology company, is currently taking part in two international research challenges aiming to automatically generate SQL queries thanks to natural language. Following the recent publication of its work, Novelis is positioned alongside Artificial Intelligence leaders, such as Microsoft, Salesforce, Google and others.

The worldwide volume of data processed daily has never been so big. These data are mostly gathered in so-called relational databases, which require mastering a Structured Query Language SQL to store or manipulate the aforementioned data. Novelis’ project aims to democratize access to these data by automatically generating technically complex queries from human language, also known as Natural Language Processing (NLP).

Novelis in major international challenges SPIDER and WikiSQL

Led by Yale University, the Spider Challenge brings together a large-scale complex cross-domain semantic data set and SQL queries. The goal is to transform natural English text into executable SQL-queries, also called “Text-to-SQL task”. The Challenge consists of 10,181 questions, 5,693 unique complex SQL-queries on 200 databases with multiple tables covering 138 domains. Following the publication of its work and at the time of publication of this article, Novelis is ranked 2nd in the world, alongside Salesforce, only 2.9 points behind the first (Tel-Aviv University & Allen Institute for AI). It is important to note that this type of challenge is evolving and results may change. Find out more and discover the results: Spider: Yale Semantic Parsing and Text-to-SQL Challenge (yale-lily.github.io)

The objective of the WikiSQL Challenge is the same as for Spider but with different constraints and contexts. Here, the participants only deal with one table from models with unsupervised learning (where the machine works on its own) or with supervised learning (where the machine relies on hints from which it generates predictions). Leading companies in Artificial Intelligence and NLP are taking part in this challenge along with reknowned universities: Microsoft, Google, Alibaba, Salesforce, the Universities of California, Berkeley, Fudan… For this event, Novelis has developed a hybrid learning model that ranks 7th out of 31 scientific projects. Follow the link for more information and complete results: GitHub – salesforce/WikiSQL: A large annotated semantic parsing corpus for developing natural language interfaces.

Innovation and R&D: A strategic priority for Novelis’ development

Since its beginning, Novelis has been investing massively (30% of its turnover) in Research and Development. According to Mehdi Nafe, CEO of Novelis: “Beyond the impact on fundamental research, our objective is to change the software design model to achieve operational excellence, change the relationship we have with technologies, and have a sustainable impact on innovation processes within society. In the last years, the major progress of data science, AI and, more recently, NLP, represents a huge potential in terms of business process optimization and use. The creation of an R&D Lab is one of Novelis’ founding acts. For a technology company, engaging in research is a key element. It is essential for better serving our customers.”

NL2Code: A Corpus and Semantic Parser for Natural Language to Code

Discover our conference paper NL2Code: A Corpus and Semantic Parser for Natural Language to Code – International conference on smart Information & communication Technologies part of the  Lecture Notes in Electrical Engineering book series (LNEE, volume 684).

Thanks to the Novelis Research Team for their knowlegde and experience.


In this work, we propose a new semantic analysis and data method that allows automatic generation of source code from specifications and descriptions written in natural language (NL2Code). Our long-term goal is to allow any user to create applications based on specifications that describe the requirements of the complete system. It involves researching, designing, and implementing intelligent systems that allow automatic generation of computer projects by answering user needs (skeleton, configuration, initialization scripts, etc.) expressed in natural language. We are taking the first step in this area to provide a new data set specifically for our Novelis company and implement a method that enables machines to understand the needs of users and express them in natural language in specific areas.

About the study

“The dream of using Frensh or any other natural language to generate a code in a specific programming language has existed for almost as long as the task of programming itself. Although significantly less precise than a formal language, natural language as a programming medium would be universally accessible and would support the automation of an application. However, the diversity and ambiguity of the texts, the compositional nature of the code and the layered abstractions in the software make it difficult to generate this code from functional specifications (natural language). The use of artificial intelligence offers interesting potential for supporting new tools in almost all areas of software engineering and program analysis. This work presents new data and semantic parsing method on a novel and ambitious domain — the program synthesis.

Our long-term goal is to enable any user to generate complete web applications frontend / backend based on Java / JEE technology and which respect a n-tier architecture (multilayer). For that, we take a first step in this direction by providing a dataset (Corpus) proposed by the company Novelis based on the dataset that contains questions / answers of the Java language of the various topics of the website ”Stack OverFlow” with a new semantic parsing method.”

Read the full article

Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 684) 

SpringerLink provid researchers with access to millions of scientific documents from journals, books, series, protocols, reference works and proceedings.

SQL Generation from Natural Language Using Supervised Learning and Recurrent Neural Networks

Discover our conference paper SQL Generation from Natural Language Using Supervised Learning and Recurrent Neural Networks – International Conference on Artificial Intelligence & Industrial Applications part of the Lecture Notes in Networks and Systems book series (LNNS, volume 144).

Thanks to the Novelis Research Team for their knowledge and expertise.


The database stores today’s large amounts of data and information. To access these data, users need to master SQL or an equivalent interface language. Therefore, using a system that can convert natural language into equivalent SQL queries will make the data more accessible. In this sense, building a natural language interface to a relational database is an important and challenging problem in the field of natural language processing (NLP) and extensive research, and due to the introduction of large-scale data sets, it has recently been discovered again momentum. In this article, we propose a method based on word embedding and recurrent neural network (RNN), precisely based on long short-term memory (LSTM) and gated recurrent unit (GRU) units. We also showed the dataset used to train and test our model, based on WikiSQL, and finally we showed our progress in accuracy.

About the study

“Vast amount of today’s information is stored in relational database and provide the foundation of applications such as medical records [1], financial markets [2], and cus- tomer relations management [3]. However, accessing relational databases requires an understanding of query languages such as SQL, which, while powerful, is difficult to master for non-technical users. Even for an expert, writing SQL queries can be chal- lenging, as it requires knowing the exact schema of the database and the roles of various entities in the query. Hence, researches has recently appeared to approach systems that map natural language to SQL query, and a long-standing goal has been to allow users to interact with the database through natural language [4,5]. We refer to this task as Text-to-SQL.

In this work, we present our approach based on Classifications [6] and Recurrent Neural Networks [7], precisely on LSTM [8] and GRU [9] cells. The idea is inspired from SQLNet approach [10]; in particular, we employ a sketch to generate a SQL query from naturel language. The sketch aligns naturally to the syntactical structure of a SQL query; Neural Networks are then used to predict the content for each slot in the sketch. Our approach can be viewed as a neural network alternative to the traditional sketch based program synthesis approaches [11,12].”

Read the full article

Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 144) 

SpringerLink provid researchers with access to millions of scientific documents from journals, books, series, protocols, reference works and proceedings.

Doctor in AI/ML/NLP – M/F

Lab R&D – Permanent Contract – Paris – PhD

This site is registered on wpml.org as a development site.