NLP Tools for Smarter Software Development
Software development produces large amounts of textual and code-based information: source code, comments, documentation, repeated code fragments and technical descriptions. Understanding the relationship between what code does and how it is described in natural language is an important challenge for modern software engineering, especially as AI-based tools become increasingly used to support developers, improve code search and automate repetitive tasks.
Advancing Novel Textual Similarity-based Solutions in Software Development (AVANTES) addressed this challenge by developing natural language processing and AI-based methods for analyzing software projects. The project explored how the meaning of program code relates to comments written in natural language, and how this relationship can be used to build smarter tools for software development.
Within the project, several NLP-based solutions were developed and tested, including methods for classifying code comments, measuring similarity between comments of different lengths, supporting semantic code search and identifying different types of duplicate code. These topics are important for improving code understanding, maintenance, documentation quality and reuse of existing software components.
The project considered multiple programming languages, including C, C++, C#, Java, JavaScript, PHP, Python and SQL, as well as natural languages including English and Serbian. In this way, AVANTES contributed to the development of AI-supported software engineering tools that can operate across different technical and linguistic environments.
Project participants were University of Belgrade – School of Electrical Engineering, Innovation Center of the School of Electrical Engineering in Belgrade, and University of Belgrade – Faculty of Philology
