AVANTES is a two-year project whose goal was to develop various NLP tools and techniques for software development. The main research question that the project dealt with was the relationship between the semantics of the program code and the meaning of the comments in the code that are written in natural languages. Several NLP problems were considered within the project, including the categorization of comments in the code according to typological taxonomy, determining the similarity of pairs of comments using methods for measuring the similarity of texts of different lengths, as well as the semantic search of the code. In addition, the project also focused on identifying different types of duplicate code. All these research objectives were considered through multiple programming (C/C++/C#, Java, JavaScript, PHP, Python, SQL) and natural languages (English and Serbian).

Project participants were University of Belgrade – School of Electrical Engineering, Innovation Center of the School of Electrical Engineering in Belgrade, and University of Belgrade – Faculty of Philology

Technologies: NLP, machine learning, artificial intelligence