Evaluation of Feature Selection Algorithms based on Synthetic Data
64th ISI World Statistics Congress - Ottawa, Canada
Format: CPS Abstract
Session: CPS 51 - Statistical methodology III
Tuesday 18 July 4 p.m. - 5:25 p.m. (Canada/Eastern)
The primary goal of this paper is to propose a collection of synthetic datasets, inspired by real life scenarios, that can be used as benchmarks for the evaluation of feature selection methods. Several fundamental feature selection algorithms are studied and their performance is evaluated in a controlled experimental setting. The complexity of the generated synthetic datasets is measured and the results are used to classify the datasets into three distinct categories. The degree of matching between the selected features by a given feature selection algorithm and the correct features is determined and linked to the complexity of the datasets. The results of this study can help practitioners of feature selection standardize the evaluation of the feature selection process and select techniques most relevant to the specific area of application.