AskGraph: A Dependency-Aware Code Assistant Powered by Code Graphs and LLM-Generated Cypher Queries

conference paper
Large Language Models (LLMs) have transformed code assistants by enabling personalization, interactivity, and higher abstraction. However, these assistants often struggle with a common limitation; they generate responses based on a limited set of relevant code snippets retrieved from the codebase using semantic similarity search. This mechanism prevents them from viewing the code structure holistically, making it difficult to give accurate and complete answers to questions on code dependencies and structure. This paper introduces a dependency-aware code assistant that answers structural questions developers cannot easily pose to general-purpose assistants like GitHub Copilot. We achieve this by enriching the LLM with dependency facts obtained from a code graph generated by a static-analysis pipeline customized specifically for industry-scale codebases. The dependency information is queried from a Neo4j database, which stores the code graph, via Text-to-Cypher translation powered by LLMs. Cypher is a query language, designed specifically for querying graph-structured data. We evaluated our solution at Philips Healthcare. Specifically, we performed a benchmark with 420 collected questions and a user study with seven industrial software engineers. By analyzing the results, we identified common mistakes made by GPT-4o in the Text-to-Cypher translation to query code graphs, including syntax, schema and semantic errors. This work lays the foundation for advancing Cypher query generation on industryscale code graphs and for augmenting graph-based code analysis with LLMs.
TNO Identifier
1017937
Files
To receive the publication files, please send an e-mail request to TNO Repository.