Well, let’s start by defining what a feature is. A feature is an X variable in your dataset, most often defined by a column. Many datasets nowadays can have 100+ features for a data analyst to sort through! That is a ridiculous amount to process normally, which is where feature selection methods come in handy especially while building a machine learning model. They allow you to reduce the number of features included in a model without sacrificing the predictive power
Data doesn’t come in a simple form. After applying EDA methods, data comes in a form that we can understand but still it is in a complex form. That’s because we do not know which features are most important, which are little, and which are not.
For example: when you are dealing with a dataset that contains a vast number of features. This type of dataset is often referred to as a high-dimensional dataset. Now, with this high dimensionality comes a lot of problems such as – this high dimensionality will significantly increase the training time of your machine learning model, it can make your model very complicated which in turn may lead to Overfitting. Often in a high dimensional feature set, there remain several features that are redundant meaning these features are nothing but extensions of the other essential features. These redundant features do not effectively contribute to the model training as well.
There is a need to dive deep to extract more meaning by increasing the understanding of the features. There comes the notion of Feature Selection, which means selecting features that matter. All features are not contributing enough to the meaning of data and too many features won’t give you a good model.
“Sometimes, less is better!”
What are the relevant or useful features?
Imagine that you are trying to guess the price of a car?
- Relevant / Useful: Engine size, model, age, company, mileage, presence of rust ,…
- Irrelevant: Colors of windscreen wipers, Stickers on windows,…
- Redundant: age, mileage
Feature Selection Methods
1. Filter Methods
Filter methods are generally used as a preprocessing step. The selection of features is independent of any machine learning algorithms. These methods rely only on the characteristics of these variables, so features are filtered out of the data before learning begins. These methods are powerful and simple and help to quickly remove features— and they are generally the first step in any feature selection pipeline.
Filter methods apply some ranking over features. The ranking denotes how ‘useful’ each feature is likely to be for classification. Once this ranking has been computed, a feature set composed of the best N features is created. Features are selected based on their scores in various statistical tests for their correlation with the outcome variable.
The Methods: In recent years, numerous methods and techniques have been proposed for univariate and multivariate filter-based feature selection.
The following table has the methods we are going to see in detail: