Workload dynamics on clusters and grids
TNO Informatie- en Communicatietechnologie
This paper presents a comprehensive statistical analysis of a variety of workloads collected on production clusters and Grids. The applications are mostly computational-intensive and each task requires single CPU for processing data, which dominate the workloads on current production Grid systems. Trace data obtained on a parallel supercomputer is also included for comparison studies. The statistical properties of workloads are investigated at different levels, including the Virtual Organization (VO) and user behavior. The aggregation procedure and scaling analysis are applied to job arrivals, leading to the identifications of several basic patterns, namely pseudo-periodicity, long range dependence (LRD), and multifractals. It is shown that statistical measures based on interarrivals are of limited usefulness and count based measures should be trusted when it comes to correlations. Other job characteristics like run time and memory consumption are also studied. A "bag-of-tasks" behavior is empirically evidenced, strongly indicating temporal locality. The nature of such dynamics in the Grid workloads is discussed. This study has important implications on workload modeling and performance predictions, and points out the need of comprehensive performance evaluation studies given the workload characteristics. © 2008 Springer Science+Business Media, LLC.
To reference this document use:
Data storage equipment
Journal of Supercomputing, 47 (47), 1-20