Similar Posts
Maximum likelihood estimation
Maximum Likelihood Estimation (MLE) is a technique used to estimate the parameters of a statistical model. But what are parameters? A parameter is a variable whose value can be estimated from historical data. For example, in the case of Linear regression (see our article on linear regression), the distribution is Y=mx+b, the parameters are m…
BOOSTING
The term ‘Boosting’ refers to a family of algorithms that converts weak learners to strong learners. Let’s understand this definition in detail by solving a problem: Let’s suppose that, given a data set of images containing images of cats and dogs, you were asked to build a model that can classify these images into two…

Correlation Filter Methods
Besides duplicate features, a dataset can also include correlated features. “Correlation is defined as a measure of the linear relationship between two quantitative variables.” A high correlation is often a useful property—if two variables are highly correlated: We can predict one from the other. Therefore, we generally look for features that are highly correlated with…
7. Bayes’ based Algorithm : Naïve and Gaussian
Bayes theorem (alternatively Bayes’ law or Bayes’ rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if cancer is related to age, then, using Bayes’ theorem, a person’s age can be used to more accurately assess the probability that they have cancer,…
4) Cross-validation to reduce Overfitting
Cross-validation (CV) is part 4 of our article on how to reduce overfitting. Its one of the techniques used to test the effectiveness of a machine learning model, it is also a resampling procedure used to evaluate a model if we have limited data. To perform CV we need to keep aside a sample/portion of the data which…

How Normalization Affects Random Forest Algorithm
Recently, I was implementing a Random Forest regressor when I faced the classical question: Should I implement data normalization? Before going into the depth of the topic, we will try to understand what normalization is. Normalization The goal of normalization is to change the values of numeric columns in the dataset to a common…