Loop transformations leveraging hardware prefetching

conference paper
Memory-bound applications heavily depend on the bandwidth of the system in order to achieve high performance. Improving temporal and/or spatial locality through loop transformations is a common way of mitigating this dependency. However, choosing the right combination of optimizations is not a trivial task, due to the fact that most of them alter the memory access pattern of the application and as a result interfere with the efficiency of the hardware prefetching mechanisms present in modern architectures. We propose an optimization algorithm that analytically classifies an algorithmic description of a loop nest in order to decide whether it should be optimized stressing its temporal or spatial locality, while also taking hardware prefetching into account. We implement our technique as a tool to be used with the Halide compiler and test it on a variety of benchmarks. We find an average performance improvement of over 40% compared to previous analytical models targeting the Halide language and compiler. © 2018 Association for Computing Machinery. ACM SIGMICRO; ACM SIGPLAN; IEEE Computer Society
TNO Identifier
842631
ISBN
9781450356176
Publisher
Association for Computing Machinery, Inc
Source title
CGO 2018 - Proceedings of the 16th International Symposium on Code Generation and Optimization, CGO 2018. 24 February 2018 through 28 February 2018
Pages
254-264
Files
To receive the publication files, please send an e-mail request to TNO Repository.