Christian Hennig

Thinking about Cluster Analysis (03/04/08)

Cluster Analysis is regarded by some as "more an art than a science" due to the variety of different approaches and the lack of simple guidelines when to apply which of them (and when not to apply cluster analysis at all). I will introduce some popular cluster analysis methods (hierarchical methods, k means, mixture based methods) and discuss in which way they treat data differently. I will then discuss what kind of decisions are necessary to choose a suitable cluster analysis method. For example, the researcher has to decide how important several features of clusters like separateness and homogeneity (which in some situations may be contradictory) are with regard to her/his research aims. Unfortunately in reality, as opposed to most text book examples, often not a single method is able to capture the whole cluster structure of a dataset. I will introduce some graphical and numerical methods for cluster validation, i.e., checking whether and to what extent a clustering found by any clustering method makes sense.
Key references:
C. Hennig: Asymmetric linear dimension reduction for classification, Journal of Computational and Graphical Statistics 13 (2004), 930-945 .
C. Hennig: Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis 52 (2007), 258-271.

Department of Computing, Goldsmiths College, University of London, New Cross, London, SE14 6NW

Tel: +44 (0) 20 7919 7850 | Fax: +44 (0) 20 7919 7853 | Email: computing@gold.ac.uk