The phrase “Reinforcement Learning” could sound a little intimidating at first, but when we break it down, it’s actually quite simple. Let’s start with the phrase itself. What does the word “reinforce” mean? No! don’t get googling already! I’ll tell you. It simply means to strengthen or support something. So Reinforcement Learning would mean, strengthening or supporting a particular way of learning. Let me elaborate.
Reinforcement Learning is one of the 3 branches of Machine Learning:
In the following sections of the article, I am going to cover everything that is required by a beginner to get started with Reinforcement Learning. So just sit back and enjoy the ride!
A simple definition of Reinforcement Learning
This is a type of machine learning, which involves an agent in an unknown environment and a goal. In the absence of a dataset, the agent learns by getting rewarded for good action and punished for a bad action.
When an agent performs an action, the environment will return to a state and the agent will get feedback accordingly if his action resulted in a good state or a bad state. Let me give you a quick example.
Consider an agent in an environment. Say there’s a fire, and a fire extinguisher and a wrench are present in the environment.
The goal state here is to put out the fire in an efficient way. Let’s see how the agent solves the problem using reinforcement learning.
Initially, the agent has no knowledge of the consequence of its actions. So let’s say the agent approaches the fire. Yes, I know it’s like the worst possible choice, but the agent doesn’t know that. In order to understand that approaching fire is dangerous, the agent has to approach it, get hurt (negative reward), and realize that it’s the wrong thing to do.
Another parameter to consider is the magnitude of how good or bad a decision is. Here, approaching the fire is a very bad decision, so it’s punished accordingly with 3 warnings. Now the agent knows it’s a bad decision.
Now the agent decides to go toward the wrench. It is not harmed by performing this action but it still counts as a bad decision because the goal is to put out the fire and a wrench will not help in doing so. So it is again punished, but less severely than before, with just 1 warning.
Now the only object left is the fire extinguisher which is the right choice. So the agent is rewarded with 2 points. The agent learns that in such a situation, the fire extinguisher is the right decision to make.
Now say the agent takes the fire extinguisher and moves toward the wrench. Again, this does no harm to the agent but it still counts as a bad decision. Remember, our goal is to put out the fire efficiently. So the agent is punished with a single warning.
The agent finally moves towards the fire with the fire extinguisher and is rewarded with 3 points. This is how reinforcement learning works. Unlike in supervised learning where it is told by the labeled dataset on what action to take, here an agent learns using a trial and error approach.
In order to perform well, it has to fail, learn from its mistakes, and not repeat them. Sounds philosophical right? Some actually believe that an agent is analogous to a baby and the world is analogous to an environment and the process of reinforcement learning is how the baby grows.
Here is one video from Edureka.com on Reinforcement Learning
Now let us understand the differences between the three branches of Machine Learning.
Supervised Learning vs Unsupervised Learning vs Reinforcement Learning
|Uses a labeled dataset||Uses an unlabeled dataset||Does not use a dataset. Learns by interacting with the environment|
|Requires supervision||Doesn’t require supervision||It falls in between supervised and unsupervised learning; Learns by an action-feedback mechanism|
|Uses pre-existing algorithms||Uses pre-existing algorithms||The agent has to start learning from scratch|
|Used to predict a defined target variable||Used to understand the patterns among data points||Used to make sequential decisions|
|An input value is mapped to a known set of output values||An input value is mapped to a set of unknown patterns identified||The trial and error method is used to identify the next optimal state|
|Used for Classification and Regression||Used for Clustering and Association rule mining||Used for Exploitation and Exploration|
|Ex: Linear Regression, KNN, Decision Trees, etc||Ex: K-Means, K-Modes, Apriorim, etc||Ex: Q-Learning, SARSA, etc|
Now that we understand the differences between the 3 types of Machine Learning, let us dive a little deeper into Reinforcement Learning. (PS: Don’t worry I’ll keep it as simple as I can)
Here are some significant technical terms that are used in the field of Reinforcement Learning.
- Agent – This is the entity that learns by interacting with the environment.
- Environment – The world that the agent can interact with.
- Action – The gestures that the agent can perform in the environment.
- State – A discrete condition of the environment.
- Policy – The mechanism used by the agent to choose the next action based on the current state of the environment.
- Reward – An immediate positive feedback given to the agent which indicates the correctness of its previous action
- Value – This is like a long-term reward that is achieved by making a few sacrifices in the short term
- Action Value – Similar to value, but this parameter takes into account the current Action as well
If you’d like to learn more about these terms, refer to the video from the link section.
Markov Decision Process
Any kind of machine learning technique, including Reinforcement Learning, requires a mathematical background to back up the theoretical intuition. This is where the MDP or Markov Decision Process comes in.
MDP is used to establish a mathematical framework for making decisions in an environment. It represents the actions, states, and values as functions that can be used to organize a policy of sorts and take decisions accordingly. To learn more about the mathematics behind this, refer to the link section.
Applications of Reinforcement Learning
- Natural Language Processing – NLP is a category of machine learning that deals with text and audio data. Reinforcement Learning is heavily used in performing topic summarization, and building chatbots that require mimicking a human by making sequential decisions to reply to a message.
- Robotics – Many industries are working on training a robot by a Reinforcement Learning methodology, by allowing the robot to interact with the system and learn.
- Healthcare – Dynamic Treatment Regimes or DTRs involve sequential treatments which use Reinforcement Learning to correctly diagnose a patient.
- Gaming – Agents are being trained to try and play games like chess and by a trial and error process, they learn by interacting with the environment.
- Trading and Marketing – Reinforcement Learning is being applied to the financial domain as well. A system is used to make decisions on budgeting, increase profit margins, and handle marketing campaigns.
Challenges in using Reinforcement Learning
- It is a very computationally intensive task compared to the forms of learning since it involves a trial-and-error methodology
- If there is sufficient data present, it is efficient to use supervised or unsupervised learning
- It is a time taking process to train the agent in an acceptable condition
- Reinforcement Learning must only be used when we can afford to make mistakes
- It doesn’t work well when the data supplied is multidimensional
Although Reinforcement Learning is less popular than its siblings, it holds a huge potential that can be profitable when used appropriately. I hope this article helped you gain a basic understanding of RL.
If you wish to learn more about RL Algorithms, you can refer to the following links