Encoding Domain Knowledge of logs during log analysis : Master Thesis

report
With software permeating our world, modern software systems grow both in size (e.g., lines of code, number of software artefacts) and complexity. Logs produced by software systems during execution are regularly the first and only information used by software engineers to comprehend the behaviour of these complex systems and to repair software faults. However, analysing logs is not trivial due to the immense size of the information logged by these systems. Many previous studies have proposed tools and techniques aiming to support log analysis, despite that, software developers commonly rely on text-based tools as well as self-made scripts and programs when analysing logs. The functionalities of such tools are often limited and do not provide developers with a way to incorporate their domain knowledge into the analysis. The domain knowledge of developers lives mainly in their minds. The goal of this thesis is to investigate how to enable software engineers to better utilise their domain knowledge of logs more actively in the analysis process. Specifically, we investigate the presence of patterns and sequences which we refer to as structures as well as the knowledge of these structures. Furthermore, as logs are commonly analysed in text-based tools in which they are represented as raw textual data, we aim to make log analysis more visual and interactive. To that end, we conduct an interview study in Philips Image Guided Therapy Systems (IGTS)
- a leading company specialising in the field of health technology, to understand what structures are present in logs and whether they are utilised by software engineers during log analysis. As a result, we observe that software engineers often utilise their domain knowledge of specific structures occurring in the logs to distinguish between irrelevant and relevant, for their analysis, parts. Consequently, we develop functionality that enables developers to encode their domain knowledge of structures in log analysis by defining structures and searching for them in logs visually and interactively. We evaluate our implementation with software engineers on logs produced by industrial software. We discover that the participants are able to encode their knowledge of structures successfully and use our features to facilitate the investigation of software issues. However, we also discovered some limitations of the features. First, the usefulness of the features depends on the experience and knowledge of the users. The features may not be helpful for analysing unfamiliar logs. Second, the features may not handle well structures that contain a large number of entries. Specifically, the results may be hard to navigate and use. Despite
these limitations, our solution receives above-average ratings for usability from the evaluation participants.
TNO Identifier
1001667
Publisher
TU/e
Collation
57 p.
Place of publication
Eindhoven