Outliers
We all have heard of the idiom ‘odd one out ’ which means something unusual in comparison to the others in a group. Similarly, an outlier is an observation in a given dataset that lies far from the rest of the observations. That means an outlier is vastly larger or smaller than the remaining values in the set.
“Outliers are the values that are far beyond the next nearest data points.”
Let’s take an example, we do customer profiling and find out that the average annual income of customers is $0.8 million. But, two customers are having annual incomes f $4 and $4.2 million. These two customers’ annual income is much higher than the rest of the population. These two observations will be seen as Outliers.
Let’s talk about different types of outliers.
1. Univariate outliers
- Univariate outliers are the data points whose values lie beyond the range of expected values based on one variable.
- Let’s see it in an example below.:
The mean of the above observations is 1307 which is higher than most of the values in the table. We all know that the mean is the arithmetic average and generally represents the centre of the data. Here, 1307 is nowhere near the centre of the entire data. And the culprit for this is the one extreme observation 10000. Hence, 10000 can be termed as an outlier that distorts the actual structure of the data.
We can analyze it using Univariate Analysis.