## Feature Selection

Well, let’s start by defining what a feature is. A feature is an X variable in your dataset, most often defined by a column. Many datasets nowadays can have 100+ features for a data analyst to sort through! That is a ridiculous amount to process normally, which is where feature selection methods come in handy…

## Bivariate and Multivariate outliers

While plotting data, some values of one variable may not lie beyond the expected range, but when you plot the data with some other variable, these values may lie far from the expected value. So, after understanding the causes of these outliers, we can handle them by dropping those records or imputing the values or…

## Univariate Analysis

Let’s see how we can use the univariate analysis to detect outliers. Univariate analysis is the simplest form of analyzing data. “Uni” means “one”, so in other words, your data has only one variable. It doesn’t deal with causes or relationships (unlike regression ) and its major purpose is to describe; It takes data, summarizes…

## Outliers

We all have heard of the idiom ‘odd one out ’  which means something unusual in comparison to the others in a group. Similarly, an outlier is an observation in a given dataset that lies far from the rest of the observations. That means an outlier is vastly larger or smaller than the remaining values…

## Model-based methods

So far, we have seen how we can use deletion methods and imputation methods to handle missing values in a dataset. These univariate methods used for missing value imputation are simplistic ways of estimating the value and may not always provide an accurate picture. For example, let us say we have variables related to the…

## Handling Missing Data

Missing data is basically the values that are missing in our dataset, and that would be meaningful for our machine learning project if observed. In this article, we’ll see how missing data can be anything from missing sequence, incomplete feature, files missing, information incomplete, data entry error, etc. Most datasets in the real world contain…

## Data Preprocessing in Machine Learning

Data Preprocessing is a technique that is used to convert raw data into clean data. In other words, whenever the data is gathered from different sources, it is collected in raw format which is not feasible for the analysis. Therefore, certain steps are executed to convert the raw data into a clean dataset.  Importance of Data Pre-processing…

## Why data normalization is important for non-linear classifiers

The term “normalization” usually refers to the terms standardization and scaling. While standardization typically aims to rescale the data to have a mean of 0 and a standard deviation of 1, scaling focuses on changing the range of the values of the dataset.   As mentioned in [1] and in many other articles, data normalization…