The Impact of Feature Selection on the Information Held in Bioinformatics Data
2015
Madara Gasparoviča-Asīte, Inese Poļaka, Ludmila Aleksejeva

The present research examines a wide range of attribute selection methods – 86 methods that include both ranking and subset evaluation approaches. The efficacy evaluation of these methods is carried out using bioinformatics data sets provided by the Latvian Biomedical Research and Study Centre. The data sets are intended for diagnostic task purposes and incorporate values of more than 1000 proteomics features as well as diagnosis (specific cancer or healthy) determined by a golden standard method (biopsy and histological analysis). The diagnostic task is solved using classification algorithms FURIA, RIPPER, C4.5, CART, KNN, SVM, FB+ and GARF in the initial and various sets with reduced dimensionality. The research paper finalises with conclusions about the most effective methods of attribute subset selection for classification task in diagnostic proteomics data.


Keywords
Bioinformatics, classification, data mining, diagnostics, feature selection

Gasparoviča-Asīte, M., Poļaka, I., Aleksejeva, L. The Impact of Feature Selection on the Information Held in Bioinformatics Data. Information Technology and Management Science. Vol.18, 2015, pp.115-121. ISSN 2255-9086. e-ISSN 2255-9094.

Publication language
English (en)
The Scientific Library of the Riga Technical University.
E-mail: uzzinas@rtu.lv; Phone: +371 28399196