# Model Selection using R-squared (R²) Measure

If you are looking for a widely-used measure that describes how powerful a **regression** is, the **R-squared** will be your cup of tea.

R² tells you how related two things are. However, we tend to use R² because it’s easier to interpret. R² is the percentage of variation (i.e. varies from 0 to 1) explained by the relationship between two variables.

In the linear regression model, R-squared acts as an evaluation metric to evaluate the scatter of the data points around the fitted regression line.

An **R-squared** of zero means our **regression line** explains none of the variability of the data. An **R-squared** of 1 would mean our model explains the entire variability of the data.

**Formula**: Below is the actual formula for calculating the R-Squared value.

Where **RSS:** Residual Sum of Square and **TSS:** Total Sum of Square

R-Square value can be defined using three other errors terms.

**1. Residual Sum of Square (RSS)**: It is the summation (for all the data points) of the square of the difference between the actual and the predicted value.

**2. Total Sum of Squares (TSS)**: It is the summation (all data points) of the square of the difference between actual output and average value ‘Y(bar)’.

The above is the simplified version for calculating the R-squared value. It uses both the residual sum of the square and the total sum of the square. If your value of R² is large, you have a better chance of your regression model fitting the observations.

You can have a visual demonstration of the plots of fitted values by observed values in a graphical manner. It illustrates how R-squared values represent the scatter around the regression line.

As observed in the pictures above, the value of R-squared for the regression model on the left side is 17%, and for the model on the right is 83%. In a regression model, when the variance accounts to be high, the data points tend to fall closer to the fitted regression line.

However, a regression model with an R² of 100% is an ideal scenario which is actually not possible. In such a case, the predicted values equal the observed values and it causes all the data points to fall exactly on the regression line.

**Further Reads:**

[1] Refer to this in-depth article on linear regression and the evaluation measure R-squared (R²) by Ajitesh Kumar: he has explained-R²-using python-sklearn-practical implementation.

[2] To know more about R² statistics please refer to this article from Analytics Vidhya.