SQL Generation from Natural Language: A Seq2Seq Model – Transformers Architecture

Novelis technical experts have once again achieved a new state-of-the-art in science. Discover our study SQL Generation from Natural Language: A Sequence-to-Sequence Model Powered by the Transformers Architecture and Association Rules, puplished on Journal of Computer Science.

Thanks to the Novelis Research Team for their knowledge and expertise.

Abstract

Using natural language (NL) to interact with relational databases allows users of any background to easily query and analyze large amounts of data. This requires a system that understands user questions and automatically translates them into structured query languages ​​(such as SQL). The best-performing Text-to-SQL system uses supervised learning (usually expressed as a classification problem) and treats this task as a sketch-based slot filling problem, or first converts the problem into an intermediate logical form (ILF) and then converts it Convert to the corresponding SQL query. However, unsupervised modeling that directly translates the problem into SQL queries has proven to be more difficult. In this sense, we propose a method to directly convert NL questions into SQL statements.

In this research, we propose a sequence-to-sequence (Seq2Seq) parsing model for NL to SQL tasks, supported by a converter architecture that explores two language models (LM): text-to-text transfer converter (T5) ) And multi-language pre-trained text-to-text converter (mT5). In addition, we use transformation-based learning algorithms to update aggregation predictions based on association rules. The resulting model implements a new state-of-the-art technology on the WikiSQL data set for weakly supervised SQL generation.

About the study

“In this study, we treat the Text-to-SQL task with WikiSQL1 (Zhong et al., 2017). This DataSet is the first large-scale dataset for Text-to-SQL, with about 80 K human-annotated pairs of Natural Language question and SQL query. WikiSQL is very challenging because tables and questions are very diverse. This DataSet contains about 24K different tables.

There are two leaderboards for the WikiSQL challenge: Weakly supervised (without using logical form during training) and supervised (with logical form during training). On the supervised challenge, there are two results: Those with Execution Guided (EG) inference and those without EG inference.”

Read the full article

Journal of Computer Science – Volume 17 No. 5, 2021, 480-489 (10 pages)

Journal of Computer Science aims to publish research articles on the theoretical basis of information and computing, and practical technologies for implementation and application in computer systems.

Artificial Neural Networks for Text-to-SQL Task: State of the Art

Discover our conference paper Artificial Neural Networks for Text-to-SQL Task: State of the Art – International conference on smart Information & communication Technologies part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 684).

Thanks to the Novelis Research Team for their knowlegde and experience.

Abstract

The database stores a large amount of data from all over the world, but to access this data, users must understand query languages ​​such as SQL. In order to facilitate this task and make it possible to interact with databases around the world, some research has recently emerged to deal with systems that understand natural language problems and automatically convert them into SQL queries. The purpose of this article is to provide the most advanced text-to-SQL tasks, in which we show the main models and existing solutions (natural language deal with). We also specify the experimental settings for each method, their limitations, and a comparison of the best available methods.

About the study

“Text-to-SQL task is one of the most important subtask of semantic parsing in natural language processing (NLP). It maps natural language sentences to corresponding SQL queries. In recent years, some state-of-the-art methods with Seq2Seq encoder-decoder architectures (Ilya Sutskever, Oriol Vinyals, Quoc V. Le 2014) [1] are able to obtain more than 80% exact matching accuracy on some complex text-to-SQL benchmarks such as Atis (Price, 1990; Dahl and al., 1994) [2], GeoQuery (Zelle and Mooney, 1996) [3], Restaurants (Tang and Mooney, 2000; Popescu and al., 2003) [4], Scholar (Iyer and al., 2017) [5], Academic (Li and Jagadish, 2014) [6], Yelp (Yaghmazadeh and al., 2017) [7] and WikiSQL (Zhong and al., 2017) [8].These models seem to have already solved most problems in this area. However, as (Finegan-Dollak et al., 2018) [9] show, because of the problematic task definition in the traditional datasets, most of these mod- els just learn to match semantic parsing results, rather than truly learn to understand the meanings of inputs and generalize to new programs and databases, which led to low precisions on more generic dataset as the case of Spider (YU, Tao, ZHANG, Rui, YANG, Kai, and al. 2018) [10].”

Read the full article

Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 684) 

SpringerLink provid researchers with access to millions of scientific documents from journals, books, series, protocols, reference works and proceedings.

SQL Generation from Natural Language Using Supervised Learning and Recurrent Neural Networks

Discover our conference paper SQL Generation from Natural Language Using Supervised Learning and Recurrent Neural Networks – International Conference on Artificial Intelligence & Industrial Applications part of the Lecture Notes in Networks and Systems book series (LNNS, volume 144).

Thanks to the Novelis Research Team for their knowledge and expertise.

Abstract

The database stores today’s large amounts of data and information. To access these data, users need to master SQL or an equivalent interface language. Therefore, using a system that can convert natural language into equivalent SQL queries will make the data more accessible. In this sense, building a natural language interface to a relational database is an important and challenging problem in the field of natural language processing (NLP) and extensive research, and due to the introduction of large-scale data sets, it has recently been discovered again momentum. In this article, we propose a method based on word embedding and recurrent neural network (RNN), precisely based on long short-term memory (LSTM) and gated recurrent unit (GRU) units. We also showed the dataset used to train and test our model, based on WikiSQL, and finally we showed our progress in accuracy.

About the study

“Vast amount of today’s information is stored in relational database and provide the foundation of applications such as medical records [1], financial markets [2], and cus- tomer relations management [3]. However, accessing relational databases requires an understanding of query languages such as SQL, which, while powerful, is difficult to master for non-technical users. Even for an expert, writing SQL queries can be chal- lenging, as it requires knowing the exact schema of the database and the roles of various entities in the query. Hence, researches has recently appeared to approach systems that map natural language to SQL query, and a long-standing goal has been to allow users to interact with the database through natural language [4,5]. We refer to this task as Text-to-SQL.

In this work, we present our approach based on Classifications [6] and Recurrent Neural Networks [7], precisely on LSTM [8] and GRU [9] cells. The idea is inspired from SQLNet approach [10]; in particular, we employ a sketch to generate a SQL query from naturel language. The sketch aligns naturally to the syntactical structure of a SQL query; Neural Networks are then used to predict the content for each slot in the sketch. Our approach can be viewed as a neural network alternative to the traditional sketch based program synthesis approaches [11,12].”

Read the full article

Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 144) 

SpringerLink provid researchers with access to millions of scientific documents from journals, books, series, protocols, reference works and proceedings.

Text2SQLNet: Syntax Type-Aware Tree Networks for Text-to-SQL

Discover our conference paper Text2SQLNet: Syntax Type-Aware Tree Networks for Text-to-SQL – International Conference Europe Middle East & North Africa Information Systems and Technologies to Support Learning part of the Learning and Analytics in Intelligent Systems book series (LAIS, volume 7).

Thanks to the Novelis Research Team for their knowlegde and experience.

Abstract

Building a natural language interface for relational databases is an important and challenging problem in natural language processing (NLP). It requires a system that can understand natural language problems and generate corresponding SQL queries. In this article, we propose the idea of ​​using type information and database content to better understand the rare entities and numbers in natural language problems to improve the model SyntaxSQLNet as the latest technology in Text-to-SQL tasks. We also showed the global architecture and technologies that can be used to implement our neural network (NN) model Text2SQLNet, and integrated our ideas, including using type information to better understand rare entities and numbers in natural language problems. If the format of the user query is incorrect, we can also use the database content to better understand the user query. The realization of this idea can further improve the performance in Text-to-SQL tasks.

About the study

“Relational databases store a vast amount of today’s information and provide the foundation of applications such as medical records (Hillestad et al., 2005)[1], financial markets (Beck and al., 2000)[2], and customer relations management (Ngai et al., 2009)[3]. However, accessing relational databases requires an understanding of query languages such as SQL, which, while powerful, is difficult to master. Natural language interfaces (NLI), a research area at the intersection of natural language processing and human- computer interactions, seeks to provide means for humans to interact with computers through the use of natural language (Androutsopoulos et al., 1995)[4]. Natural language always contains ambiguities, each user can express himself in his own way.”

Read the full article

Part of the Learning and Analytics in Intelligent Systems book series (LAIS, volume 7)

SpringerLink provid researchers with access to millions of scientific documents from journals, books, series, protocols, reference works and proceedings.