Assessment of automated stratigraphic interpretations of boreholes with geology-informed metrics

article
Stratigraphic interpretation of borehole data is a fundamental aspect of subsurface geological models, providing critical insights into the distribution of stratigraphic units. However, expert interpretation of all available borehole data is impractical for large-scale regional mapping involving thousands of boreholes. Automated interpretations using machine learning models can significantly increase the number of boreholes included in subsurface geological models. Nevertheless, these predictions must adhere to strict spatial and stratigraphic relationships (e.g. superposition) to ensure geological plausibility, which often requires post-processing tasks. Traditional evaluation metrics commonly used for general-domain classification tasks (e.g. accuracy, F1- score) do not necessarily reflect the geological plausibility of predictions, as they fail to account for the sequential nature and spatial relationships inherent in borehole interpretation. To address this limitation, we propose and evaluate a set of geology-informed metrics that focus on three key aspects of stratigraphic interpretation, namely the expected geographical extent of units (extent metrics), their sequential relationships (sequence metrics), and their vertical positioning along boreholes (position metrics). Using a dataset of 1394 boreholes from the Cenozoic Roer Valley Graben (southeast Netherlands), which covers ∼3000 km2 and includes 15 lithostratigraphic units, we demonstrate that Random Forest and Neural Network models with similar performance on traditional metrics (e.g. accuracy, Cohen’s kappa, and F1-score) can differ significantly in their ability to produce geologically plausible predictions. For example, while many model configurations achieve ∼75%–80% agreement between expected and predicted classes, the Neural Network models better capture the sequential stratigraphic relationships expected in the study area. Our results underscore the need for domain-specific metrics that offer a more accurate and interpretable assessment of model performance.
TNO Identifier
1018818
Source
Computers and Geoscience(207), pp. 1-16.
Pages
1-16