Distance Metrics Selection Validity in Cluster Analysis
Peter Grabusts

In cluster analysis data are divided into groups according to a specific criterion called metrics. Traditionally the metrics of choice has been Euclidean distance. This article studies other distance metrics used in cluster analysis– Manhattan distance, Cosine distance and Pearson correlation measure. In k-means clustering algorithm these metrics were used to determine cluster centers and the clustering correctness was evaluated. It was found that the clustering results were very similar. The article also contemplates to evaluate clustering validity criteria.

clustering algorithms, cluster validity, k-means, metrics

Grabusts, P. Distance Metrics Selection Validity in Cluster Analysis. IT and Management Science. Vol.49, 2011, pp.72-77. ISSN 1407-7493. Available from: doi:10.2478/v10143-011-0045-y

Publication language
English (en)
The Scientific Library of the Riga Technical University.
E-mail: uzzinas@rtu.lv; Phone: +371 28399196