|

Data Normalization

Normalization is a data preparation technique that is frequently used in machine learning. Data Normalization is a common practice in machine learning which consists of transforming numeric columns to a common scale. i.e. it transforms multi-scaled data to the same scale.  In machine learning, some feature values differ from others multiple times. The features with…

|

 II. Multicollinearity – VIF

Multicollinearity is a phenomenon unique to multiple regressions that occurs when two variables that are supposed to be independent in reality have a high amount of correlation and are overlapping in what they measure.  In other words, each variable doesn’t give you entirely new information. To picture what multicollinearity is, let’s start by picturing what it…

|

I. Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is an unsupervised technique used in machine learning to reduce the dimensionality of data.  It does so by compressing the feature space by identifying a subspace that captures most of the information in the complete feature matrix. It projects the original feature space into lower dimensionality.  PCA technique is used for those datasets that…

Correlation Filter Methods
| |

Correlation Filter Methods

Besides duplicate features, a dataset can also include correlated features. “Correlation is defined as a measure of the linear relationship between two quantitative variables.” A high correlation is often a useful property—if two variables are highly correlated: We can predict one from the other. Therefore, we generally look for features that are highly correlated with…