PURPOSE: Tumour heterogeneity could be a valuable biomarker for differentiation, grading, response monitoring and outcome prediction. Many quantification techniques have been described, however in clinical practice these methods are scarcely used. The aim of this study is to evaluate the performance of the described methods and to identify the bottlenecks for the implementation in clinical practice. METHOD AND MATERIALS: We searched OVID, EMBASE, and Cochrane CENTRAL up to 24 March 2013. Heterogeneity analysis methods were classified into four categories, i.e., non-spatial methods (NSM), spatial grey level methods (SGLM), fractal analysis (FA) methods, and filters and transforms (FandT). RESULTS: From 6908 potentially relevant publications, 183 studies were included. The number of studies has been increasing steadily since 2009. Generally, 60 % studies use NSM,49% use SGLM, 11 % use FA, and 28% use FandT. Differential diagnosis, grading or outcome prediction was the goal in 86% studies, 36% studies were based on MRI, and 88% studies were conducted retrospectively. Tumours in the breast and brain together cover 49% of the studies. No relation was found between the discriminative power and the quantification methods used, or between the discriminative power and the imaging modality. The reported AUC ranged from 0.5 to 1 with a median of 0.89. A negative correlation was found between the AUC and the number of features estimated per tumour, which is presumably caused by overfitting in small datasets. In only 53.4% of the classification studies, the use of cross-validation was reported. None of the publications report the use of an external validation set to test their findings. Retrospective analyses were conducted in 60% of the studies without a clear description of the inclusion criteria. Only 12% of the studies had a prospective study design. Almost none of the papers evaluated the incremental value of the heterogeneity biomarker on top of clinical established markers. CONCLUSION: To enable the translation of imaging biomarkers from the research stage to clinical practice, research should focus more on prospective studies, use external datasets for validation, and focus on the added value of the proposed heterogeneity biomarker on top of the clinical established markers. CLINICAL RELEVANCE/APPLICATION: Heterogeneity has the potential of a valuable biomarker.