Clustering Algorithm Specifics in Class Decomposition
Applied Information and Communication Technology (AICT2013): Proceedings of the 6th International Scientific Conference 2013
Inese Poļaka

The task of the presented study is to find different disease phenotypes of cancer (breast cancer, carcinoma, gastric cancer, melanoma, prostate cancer) and gastrointestinal inflammatory disease using clustering algorithms. The article analyzes the performance of two different approaches to clustering data for class decomposition. One of them is using agglomerative hierarchical clustering and analyzing the obtained dendrogram to determine the number of disease subtypes; another is using k-means algorithm and determining the number of disease subtypes by analyzing the cluster compactness after several runs (using different numbers of clusters/cluster centers). After clustering is done, the clusters are analyzed to assess their specifics and the potential of clusters being different phenotypes or disease subtypes. The initial analysis of clustering results consisted of analyzing records belonging to clusters, cluster sizes and specifics. The secondary analysis of clustering was cluster quality evaluation that was done using classification algorithms (C4.5, Random Forest and SVM). The hypothesis is that well formed clusters would create disease subtypes that would be easily split using classification algorithms. The main results of the study show that the secondary analysis of the clusters is very similar for both clustering approaches and increases the classification results compared to the results of initial full data classification. The results also point to the sensitivity of k-means algorithm to noise and outliers because the initial analysis showed that the clusters formed a main group of records and several clusters of very few records. Although hierarchical clustering asks for expert opinion in cluster number determination, it also showed that it formed several large clusters that could point to phenotypical subtypes of the diseases.


Atslēgas vārdi
boinformatics, class decomposition, clustering
Hipersaite
http://aict.itf.llu.lv/proceedings/2013

Poļaka, I. Clustering Algorithm Specifics in Class Decomposition. No: Applied Information and Communication Technology (AICT2013): Proceedings of the 6th International Scientific Conference, Latvija, Jelgava, 25.-26. aprīlis, 2013. Jelgava: Latvia University of Agriculture, 2013, 29.-35.lpp. ISSN 2255-8586.

Publikācijas valoda
English (en)
RTU Zinātniskā bibliotēka.
E-pasts: uzzinas@rtu.lv; Tālr: +371 28399196