Imputation Methods

Imputation is a technique used for replacing (or impute) the missing data in a dataset with some substitute value to retain most of the data/information of the dataset. These techniques are used because removing the data from the dataset every time is not feasible and can lead to a reduction in the size of the…

Handling Missing Data

Missing values in the dataset must be handled before you start any statistical analysis or build a machine learning model. Let’s look at some techniques to treat the missing value with the help of an example. The 2 tables below give different insights. The inference from the table on the left with the missing data…

Types of Missing Data

Below are the different types of missing data generally found in machine learning problems 1. MCAR(Missing completely at random) These values do not depend on any other features. In this case, there may be no pattern as to why a column’s data is missing. For example, survey data is missing because someone could not make…

|

Data Preprocessing in Machine Learning

Data Preprocessing is a technique that is used to convert raw data into clean data. In other words, whenever the data is gathered from different sources, it is collected in raw format which is not feasible for the analysis. Therefore, certain steps are executed to convert the raw data into a clean dataset.  Importance of Data Pre-processing…

How Normalization Affects Random Forest Algorithm
| |

How Normalization Affects Random Forest Algorithm

  Recently, I was implementing a Random Forest regressor when I faced the classical question: Should I implement data normalization? Before going into the depth of the topic, we will try to understand what normalization is. Normalization The goal of normalization is to change the values of numeric columns in the dataset to a common…

Why data normalization is important for non-linear classifiers
| |

Why data normalization is important for non-linear classifiers

The term “normalization” usually refers to the terms standardization and scaling. While standardization typically aims to rescale the data to have a mean of 0 and a standard deviation of 1, scaling focuses on changing the range of the values of the dataset.   As mentioned in [1] and in many other articles, data normalization…