Title
The Growing N-Gram Algorithm: A Novel Approach to String Clustering
Author
Grappiolo, C.
Verwielen, E.
Noorman, N.
Publication year
2019
Abstract
Connected high-tech systems allow the gathering of operational data at unprecedented volumes. A direct benefit of this is the possibility to extract usage models, that is, a generic representations of how such systems are used in their field of application. Usage models are extremely important, as they can help in understanding the discrepancies between how a system was designed to be used and how it is used in practice. We interpret usage modelling as an unsupervised learning task and present a novel algorithm, hereafter called Growing Grams (GNG), which relies on n-grams—arguably the most popular modelling technique for natural language processing — to cluster and model, in a two-step rationale, a dataset of strings. We empirically compare its performance against some other common techniques for string processing and clustering. The gathered results suggest that the GNG algorithm is a viable approach to usage modelling.
Subject
String clustering
N-grams
Operational usage modelling
System verification testing
Industrial Innovation
To reference this document use:
http://resolver.tudelft.nl/uuid:700e740b-6c7f-4041-ba8a-93c3629007a9
TNO identifier
861983
Bibliographical note
ICPRAM 2019, 8th International Conference on Pattern Recognition Application and Methods, Prague, Czech Republic, 19- 21 February 2019
Document type
conference paper