Logistic Regression (now with the math behind it!)

Logistic Regression is a type of linear model that’s mostly used for binary classification but can also be used for multi-class classification. If the term linear model sounds something familiar, then that might be because Linear Regression is also a type of linear model.

To proceed with this notebook you first have to make sure that you understand ML concepts like Linear Regression, Cost Function, and Gradient Descent and mathematical concepts like Logarithm and Matrices. If you don’t, then the links below can help you out.

  1. Linear Regression
  2. Gradient Descent

If you would like to follow the topic with interactive code then, I have made a Kaggle notebook for this exact purpose.

Click here to try it out for yourself!


To understand Logistic Regression properly, you have to first understand a few other concepts too. Think of Logistic Regression like an onion. In the same way, you have to go through multiple layers to reach the sweet juicy middle part of an onion, you have to go through a few concepts before you can understand Logistic Regression from scratch!

(When did onions have a sweet juicy middle part?)

(I don’t know…he probably meant…some fruit?)

Onions aside, let’s first learn about the Decision Boundary.

Decision Boundary

In the simplest terms, a decision boundary is just a line that can help us in identifying which point belongs to which class. The image below can help you understand a decision boundary much more clearly.

Logistic Regression - Decision Boundary
Decision Boundary

Here the blue line separates the two classes which are represented as green and red dots. Any point to the left of the decision boundary belongs to the class represented with the red dots. Any point to the right belongs to the class represented with the green dots. That’s all that a decision boundary does.

It can be calculated using the equation of the straight line itself. The equation of the straight line in the general form can be given as this –

Logistic Regression - Equation of a straight line
Equation of a straight line

Where:

  • a is the coefficient of x,
  • b is the coefficient of y.
  • c is some arbitrary constant.

Using this equation we can assume that the equation of the decision boundary is given as:

Logistic Regression - Equation of a decision boundary
Equation of a decision boundary

Where,

  • x1 is the 1st feature variable.
  • x2 is the 2nd feature variable.

If we are able to calculate x2 values for certain x1 values, then we would be able to plot our decision boundary. This can be done this way.

Logistic Regression - Equation of a decision boundary
Equation of a decision boundary

Now that we have a way to plot the decision boundary, you might think “Why don’t we use Linear Regression for this? It can help us plot a line based on β values.”

It is true that Linear Regression can help us plot a line based on some β values, but the Cost Function of Linear Regression minimizes the distance between the line of best fit and the actual points. This isn’t helpful in classifying points. For ideal classification, we would need to get the probability of something belonging to a certain class and assign that item a class only if the probability is above a certain threshold. From this, you can infer two things,

  1. For Logistic Regression, we’ll need a way to get the values in terms of probabilities.
  2. For Logistic Regression we would need a new Cost Function.

Sigmoid Function

The sigmoid function is as follows:

Logistic Regression - Sigmoid Function
Sigmoid Function

It takes in any series and gives out that series in the terms of probabilities, which restricts it from 0 to 1. Let’s take an example of this.

Suppose I have a list of numbers from -100 to 100, {num | num ∈ [-100, 100]}. If I pass this list inside the sigmoid function, it would be turned into something like this.

Logistic Regression - Sigmoidal value of the series
Sigmoidal value of the series

The graph above gives us the probability of a number being greater than or less than zero. If we say that each number with a corresponding sigmoidal value that is greater than 0.5 is greater than 0, and each number with a corresponding sigmoidal value that is less than 0.5 is less than 0 then we would have the list of all positive and number numbers present in our input list.

We can try to predict the class of an item using β0+β1×1+β2×2. If we plot this line on a graph it would look something like this.

Logistic Regression - Prediction line on the classes of the data
Prediction line on the classes of the data

This line has a problem. No matter what your class names are, one of them is considered class 1 while the other is considered class 0. Meaning that our predictions should always be in the range of 0 to 1, which is something this line doesn’t do. So to fix this, we would pass it inside the sigmoid function. This would make the equation look something like this,

Logistic Regression - Sigmoid Function on the equation of the prediction line
Sigmoid Function on the equation of the prediction line

This equation can be written in the terms of matrices.

Logistic Regression - Sigmoid Function on the matrix of the prediction line
Sigmoid Function on the matrix of the prediction line

Where –

  • B is the matrix with all the regression coefficients.
Logistic Regression - Matrix of betas
Matrix of betas
  • X is the matrix with all the feature values with an added column with 1s.
Logisitic Regression - Matrix of features
Matrix of features

The sigmoid function can help us in differentiating two classes but only when we have the equation of the ideal line to pass into the function. And how can we get the equation of the ideal line? It’s simple. By minimizing the cost function for Logistic Regression.


Cost Function

Just like Linear Regression had MSE as its cost function, Logistic Regression has one too. So let’s derive it.

Likelihood Function

So…we know that Logistic Regression is used for binary classification. Meaning the predictions can only be 0 or 1 (Either it belongs to a class, or it doesn’t). So suppose, the probability of something belonging to class 1 is p, then the probability of it belonging to class 0 would be 1−p.

Logistic Regression - Probabilities of something belonging to a class
Probabilities of something belonging to a class

We can combine these two equations into something like this.

Logistic Regression - Single equation of probabilities of something belonging to a class
Single equation of probabilities of something belonging to a class

If we substitute y with 1 we get the following.

Logistic Regression - Probability of something belonging to class 1
Probability of something belonging to class 1

If we substitute y with 0 we get the following.

Logistic Regression - Probability of something belonging to class 1
Probability of something belonging to class 1

This equation is called the likelihood function, and it can give us the likelihood of one item belonging to a class. To get the likelihood function of all the items in a series, we can just multiply the likelihood of all the items.

Logistic Regression - Likelihood function for all items
Likelihood function for all items

Log-Likelihood Function

When we start applying it to a series, the likelihood function would return huge numbers. This would complexify our calculations. So to tackle this problem we can take the log of this function.

Logistic Regression - Log-Likelihood function
Log-Likelihood function

This function takes in the values of pi and 1−pi which range from 0 to 1 (it takes in probabilities).

Let’s plot a log of numbers that fall between 0 and 1.

Logistic Regression - Log of numbers between 0 and 1
Log of numbers between 0 and 1

As you can see the log of numbers from 0 to 1 is negative. Meaning the whole function P(y) would be negative for all the inputs. So we would multiply −1 with P(y) to fix this.

And one more thing. ∑ni=1(yi log pi+(1−yi)log(1−pi) gives us the sum of all errors and not the mean. So to fix this we can divide the whole equation by n to get the mean of all errors.

Logistic Regression - Cost Function for Logistic Regression
Cost Function for Logistic Regression

And to avoid overfitting, let’s add penalization to the equation just the way we added it to the cost function for Ridge Regression.

Logistic Regression - Log-Likelihood Function
Log-Likelihood Function

This function we have here is also called the Regularized Cost Function and can help us in getting the error values for certain values of βs.


Gradient Descent

Now that we have our Cost Function all we need to do is find the minimum value of it to get the best predictions. And we can do this by applying partial differentiation to the function.

According to the Convergence Theorem, the ideal β value can be calculated using the equation below.

Logistic Regression - Convergence Theorem
Convergence Theorem

All we need to do is find the value of ∂J/∂βn for each β and we are good to go.

We know the Cost Function so we can get the value of ∂J/∂β0 by applying partial differentiation to it.

Logistic Regression - Derivation of ∂J/∂β0
Derivation of ∂J/∂β0

We know that

Logistic Regression - Derivation of ∂J/∂β0
Derivation of ∂J/∂β0

On adding −1 and 1 to the above equation, we get

Logistic Regression - Derivation of ∂J/∂β0
Derivation of ∂J/∂β0

On substituting ∂pi∂β0 in the derivative of the cost function with respect to β0, we get

Logistic Regression - Derivation of ∂J/∂β0
Derivation of ∂J/∂β0

Similarly, if you differentiate J with respect to β1, you will get

Logistic Regression - Value of ∂J/∂β1
Value of ∂J/∂β1

In general, for βn you will get

Logistic Regression - Value of ∂J/∂βn
Value of ∂J/∂βn

Now that we have the Cost Function and a way to implement Gradient Descent on it, all we need to do is run a loop for some number of iterations to get the best values of all the βs for our classification problems.

Once we have them, we can use them to create a line and pass it inside the sigmoid function. Which would look something like this,

Logistic Regression - Sigmoidal Line
Sigmoidal Line

If you compare this to the line in Image 7 you can see that it overcomes the shortcoming the previous line had. The values predicted by this line are between 0 and 1.

Once we have the ideal β values we can pass them into the equation in Image 4 to get the Decision Boundary.

Logistic Regression - Decision Boundary using βs
Decision Boundary using βs

Sources –

  1. StatQuest on Youtube,
  2. Article on Analytics Vidhya by Anshul Saini.
  3. Article on Medium.com by Asha Ponraj.
  4. Article on KDNuggets by Clair Liu.
  5. Article on satishgunjal.com by Satish Gunjal.
  6. Article on Log Loss by Megha Setia on Analytics Vidhya.
  7. Kaggle notebook by Rishi Jain.

Business Consultancy

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam

Help To Grow Business

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam

Great Support

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam

Make Your Business More Faster

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam

Our Services

Busisness Planning

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt

Busisness Consultancy

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt

Expert Advisers

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt

Financial Services

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt

24/7 Customer Support

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt

Risk Management

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt

Working Hour

Project Completed

Cup’s of Coffee

Happy Clients

Our Expert Team

Nill Smith

Head. Legal Advising

Kevin Thomson

Founder & CEO

Alex Browne

Director, Finance

Alex Peter

Marketing Head

Words From Our Clients

Jarred Letto

Business Development

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore
et dolore magna aliqua. Ut enim ad minim veniam Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor incididunt ut lab

Want to start a new project with us? Let’s Start!

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut lab

Similar Posts

Leave a Reply

Your email address will not be published.