NL2Code: A Corpus and Semantic Parser for Natural Language to Code

Discover our conference paper NL2Code: A Corpus and Semantic Parser for Natural Language to CodeInternational conference on smart Information & communication Technologies part of the  Lecture Notes in Electrical Engineering book series (LNEE, volume 684).

 

Thanks to the Novelis Research Team for their knowlegde and experience.

 

Abstract

In this work, we propose a new semantic analysis and data method that allows automatic generation of source code from specifications and descriptions written in natural language (NL2Code). Our long-term goal is to allow any user to create applications based on specifications that describe the requirements of the complete system. It involves researching, designing, and implementing intelligent systems that allow automatic generation of computer projects by answering user needs (skeleton, configuration, initialization scripts, etc.) expressed in natural language. We are taking the first step in this area to provide a new data set specifically for our Novelis company and implement a method that enables machines to understand the needs of users and express them in natural language in specific areas.

 

About the study

“The dream of using Frensh or any other natural language to generate a code in a specific programming language has existed for almost as long as the task of programming itself. Although significantly less precise than a formal language, natural language as a programming medium would be universally accessible and would support the automation of an application. However, the diversity and ambiguity of the texts, the compositional nature of the code and the layered abstractions in the software make it difficult to generate this code from functional specifications (natural language). The use of artificial intelligence offers interesting potential for supporting new tools in almost all areas of software engineering and program analysis. This work presents new data and semantic parsing method on a novel and ambitious domain — the program synthesis.

Our long-term goal is to enable any user to generate complete web applications frontend / backend based on Java / JEE technology and which respect a n-tier architecture (multilayer). For that, we take a first step in this direction by providing a dataset (Corpus) proposed by the company Novelis based on the dataset that contains questions / answers of the Java language of the various topics of the website ”Stack OverFlow” with a new semantic parsing method.”

 

Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 684)
 
SpringerLink provid researchers with access to millions of scientific documents from journals, books, series, protocols, reference works and proceedings.