Online feature selection and classification with incomplete data

Kalkan H.

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, vol.22, no.6, pp.1625-1636, 2014 (SCI-Expanded) identifier identifier


This paper presents a classification system in which learning, feature selection, and classification for incomplete data are simultaneously carried out in an online manner. Learning is conducted on a predefined model including the class-dependent mean vectors and correlation coefficients, which are obtained by incrementally processing the incoming observations with missing features. A nearest neighbor with a Gaussian mixture model, whose parameters are also estimated from the trained model, is used for classification. When a testing observation is received, the algorithm discards the missing attributes on the observation and ranks the available features by performing feature selection on the model that has been trained so far. The developed algorithm is tested on a benchmark dataset. The effect of missing features for online feature selection and classification are discussed and presented. The algorithm easily converges to the stable state of feature selection with similar accuracy results as those when using the complete and incomplete feature set with up to 50% missing data.