Feature Selection for Bioinformatics Data Sets – Is It Recommended?
Proceedings of the 5th International Conference on Applied Information and Communication Technologies (AICT2012) 2012
Madara Gasparoviča-Asīte, Ludmila Aleksejeva

This article studies the impact of feature selection methods on the results of bioinformatics data classification. The success of data preparation using preprocessing techniques determines the positive result of classification. The bioinformatics data have a specific property – they have a large number of attributes (up to tens of thousands) and a comparably small number of records (few hundred or less). Thereby the attribute selection can increase the speed of classification results computation without losing in accuracy. This study examines A Fast Correlation-based Filter Solution that is especially suited for data of high dimensionality as the data used in bioinformatics. This approach is compared with various other attribute selection methods to validate its suitability. The article also investigates the changes of the classification results appearing when the feature selection is applied compared to the results obtained with the full data sets. The data sets used in the experiments are real biomedical cancer data sets provided by Latvian Biomedical Study and Research Center holding information about breast cancer, prostate cancer, gastric cancer, gastric intestinal disease and healthy donor samples and other eighteen popular and in literature used data sets. The experiments implement three classification algorithms – Fuzzy Unordered Rule induction algorithm, JRIP and RIDOR algorithms, which induce rules in If-Then form. To interpret the classification results more accurately, the experiments were executed using data sets that are often used in popular researches worldwide. The article ends with conclusions and the answer to the question whether attribute selection is recommended for bioinformatics data sets.


Atslēgas vārdi
classification, bioinformatics data, feature selection

Gasparoviča-Asīte, M., Aleksejeva, L. Feature Selection for Bioinformatics Data Sets – Is It Recommended?. No: Proceedings of the 5th International Conference on Applied Information and Communication Technologies (AICT2012), Latvija, Jelgava, 26.-27. aprīlis, 2012. Jelgava: Latvia University of Agriculture. Faculty of Information Technologies, 2012, 325.-335.lpp. ISBN 978-9984-48-065-7.

Publikācijas valoda
English (en)
RTU Zinātniskā bibliotēka.
E-pasts: uzzinas@rtu.lv; Tālr: +371 28399196