AST Matching based on Concrete Syntax Patterns: Exploration of the Specification Challenges
conference paper
Software analysis often relies on pattern matching in terms of Abstract Syntax Trees (ASTs), but AST patterns are known to be tedious to specify. Concrete syntax patterns with placeholders have been proposed as a user-friendly alternative. Several designs for this proposal have been implemented, but these typically focus on specific parsing technologies. In this paper we explore the overarching challenges of specifying AST patterns using concrete syntax with placeholders. Using our experience with industrial applications, we take the perspective of an analyst who creates concrete syntax patterns to find matches in a code base. We identify two specification challenges: (1) understanding the underlying AST structure, and (2) ambiguities caused by placeholders. For designs based on black-box parsers we also inventorize the challenge of (3) encoding and recognizing the placeholders in concrete syntax patterns. We illustrate these challenges with examples in the Ada and C/C++ programming languages. Our results can serve as warnings to users of concrete syntax patterns, as additional requirements for parser front-ends, as attention points for language specifications, and as starting points for further research on pattern matching.
TNO Identifier
1008592
ISSN
1613-0073
Publisher
CUER-WS
Article nr.
5
Source title
Proceedings of the 23nd Belgium-Netherlands Software Evolution Workshop (BENEVOL 2024
Pages
35-41