Reinforcement Learning Psychology

Jan 19 2021 | Insights

Reinforcement learning psychology is the basis behind open-ended learning methodologies. Find out more about how they differ from closed learning systems and what that can mean for how you learn new skills

Neuroscience is a complex but fascinating branch of science that helps us understand the complexities of our brain. One of the most revolutionary concepts in neuroscience is known as reinforcement learning, one of the more efficient and time-tested strategies to help us learn something.

What is reinforcement learning?

Reinforcement learning is based on a simple reward system that helps to motivate and encourage us to learn.

Take a human baby for example. If he is sitting down then he is in an idle state. He wants to reach a ball in front of him to play with it. So how does he learn to walk? First, he needs to take action. Perhaps this means moving his foot, shuffling closer, or actually using his hands and feet. However, the baby doesn’t know how to walk. So what the baby does is try out all of their hands and limbs. Using certain limbs will hurt and cause pain, meaning they’ll avoid those movements in the future. Other movements may move them close to the ball that they want to play with. By moving closer, the baby learns which movements are positive and which are negative.

This can also be called operant conditioning, a method of learning often attributed to B.F. Skinner where the consequences of a response determining the likelihood of it being repeated. Let’s take our previous baby example. If the baby continues to fall and receive feedback in the form of pain, then it’s far less likely to do that. Eventually, the baby will stumble upon some movements that get him closer to the ball. This provides positive feedback and is the basis of a trial-and-error learning system.

Types of reinforcement

There are multiple different types of reinforcement.

Primary reinforcement

This is often referred to as unconditional reinforcement since it covers naturally-occurring reinforcement that we know without needing to learn about it. For example, we know that air, food, sleep, and water are essential for our daily lives. As such, these are known as primary reinforcement because our species requires those things in order to live. However, genetics and experience can play a role in how reinforcement works. For instance, some people might feel more rewarded with certain foods, while others might not like it and turn it down.

Secondary reinforcement

Also known as conditioned reinforcement, this involves stimuli that have become rewarding because they’ve been paired with another reinforcing stimulus. For example, money is often considered a human reinforcer since it’s a basic reward. Money can then be used to purchase primary reinforcers such as food and clothing. This makes it a secondary reinforcement since it can’t directly help with primary reinforcement but can purchase things that provide primary enforcement.

Positive and negative reinforcement

Reinforcement can also be positive or negative.

Either positive or negative reinforcement can be used as part of operant conditioning. The idea is to strengthen a behavior using reinforcement to ensure that it occurs again.

  • Positive reinforcement means rewarding the subject for a behavior.

  • Negative reinforcement means avoiding a negative outcome for the same behavior.

An example of positive reinforcement would be rewarding someone for completing a certain task. When switching this to negative reinforcement, it could be the denial of a reward when failing to complete a certain task. There are subtle differences here that can make a huge difference in the way someone approaches reinforcement learning.

Scheduled reinforcement

In order for these reinforcements to be effective, you may be required to schedule them on a regular basis. There are two foundational forms of reinforcement schedules; continuous reinforcement and partial reinforcement.

Continuous reinforcement means the desired behavior is reinforced every single time it occurs. When learning, this schedule is used towards the beginning stages of learning something in order to create a strong link between the behavior and response. This is often used when training new staff and teaching them new skills.

Partial reinforcement is used once the response has been established. Partial reinforcement is effective at maintaining the behavior and ensuring it doesn’t disappear. There are four different partial reinforcement schedules.

  • Fixed-ratio schedules


    – Used when a response is reinforced only after a specified number of responses.

  • Variable-ratio schedules


    – Used when a response is reinforced after a random number of responses. This is often used in gambling or lottery.

  • Fixed-interval schedules


    – Used when the first response is rewarded only after a specified amount of time has passed.

  • Variable-interval schedules


    – Occurs when a response is awarded after an unpredictable amount of time.

Using the right schedule will often be determined by a variety of different factors. If you’re looking to teach something new, then continuous schedules are typically the go-to option. However, once the behavior or skill has been learned, it’s wise to switch to a partial reinforcement schedule to ensure that the skill is not lost or doesn’t end up becoming negative reinforcement.

For instance, let’s imagine you received a reward every time you arrived at the workplace on time. This could be a great way to teach you that coming to work on time is good. However, if you used this continuous reinforcement for an extended period of time, then it could eventually turn into negative reinforcement because you don’t want to miss out on the reward.

To combat this, it’s important that you hand out rewards on a less predictable partial reinforcement schedule. This is more realistic in terms of how often you can give someone a reward and also produces better responses without risking the possibility of forgetting the behavior that you want to instill in someone.

Coaching is a practical example of how reinforcement learning can be used to efficiently train someone and teach them new skills and knowledge. A skilled coach will use a variety of different reinforcement schedules based on the subject’s current level of knowledge. By tweaking this schedule and making use of both positive and negative reinforcement at the right time, it ensures that the coachee commits that new knowledge to their memory in an efficient and practical manner that facilities a long term change in behavior.

Bring this principle into real-world practice with Ezra’s world-class employee coaching, built to fit into today’s working life. We’ve redesigned leadership coaching for the modern age to help transform people through affordable, scalable and high-impact solutions, with equitable access through our world-class coaching app. Find out today how digital coaching could make a big difference to your organization.

Explore more Insights