Reinforcement Learning Notes: Introduction to reinforcement learning

10 minute read

Published:

Brief introduction to reinforcement learning.

Introduction to reinforcement learning

Machine learning (ML) is a subfield of aritificial intelligence (AI) where, via mathematical modeling, one is trying to make predicitions. At the time of the writing, ML has three core fields

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning

Let’s review briefly what supervised and unsupervised learning paradigms are and then get into RL which is the theme of these notes.

Supervised learning paradigm

Supervised learning algorithms are, broadly speaking, categorized into

  • Regression models
  • Classification models

Regression modeling is concerned with numerical predictions such as the price of a house. Linear regression assumes the following mathematical relationship between the variables

\[y = ax + b + \epsilon\]

where epsilon represents the modeling error. The following image shows a data set and the linear regression line fitted in it

Trulli
Figure 1. Example of simple linear regression. Image from [2].

Below is an example of linear regression using sklearn. The example assumes just one feature.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Data Generation
np.random.seed(42)
x = np.random.rand(100, 1)
y = 1 + 2 * x + .1 * np.random.randn(100, 1)

# Shuffles the indices
idx = np.arange(100)
np.random.shuffle(idx)

# Uses first 80 random indices for train
train_idx = idx[:80]
# Uses the remaining indices for validation
val_idx = idx[80:]

# Generates train and validation sets
x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]
plt.title("Train data")
plt.scatter(x_train, y_train)
<matplotlib.collections.PathCollection at 0x7fcb5c0e6fd0>

png

linr = LinearRegression()
linr.fit(x_train, y_train)
print(linr.intercept_, linr.coef_[0])
[1.02354075] [1.96896447]

Now that we have model we can use it to make predictions.

Classification modeling is concerned with classifying or labeling appropriately data points. For example given a set of images depicting cats and dogs classify which image shows a cat and which shows a dog. Another example could be to classify email as spam or not spam. A very simple classification algorithm is the k-nearest neighbors. The following images show the application of the algorithm on the given data set (shown in the top image) and the classification achived using one (middle image) and five neighbors (bottom) image

Trulli
Figure 1. The dataset. Image from [3].
Trulli
Figure 2. The 1NN classification map. Image from [3].
Trulli
Figure 3. The 5NN classification map. Image from [3].

Regardless of whether we apply a regression or a classification methodology, in supervised learning we have in hand a data set $\mathcal{D}$. The data set contains a number of points that hopefully describe the environment we want to understand. Each of these points is also labeled accordingly.

Unsupervised learning paradigm

The next learning paradigm we are going to look at is unsupervised learning. Unsupervised learning can to a large extent be categorized into cluster analysis and dimensionality reduction techniques like principal component analysis. Cluster analysis is probably the most well known unsupervised learning paradigm. Clustering is the process of grouping similar objects together [4]. We have two types of clustering:

  • flat or partitional clustering where we partition the items into disjoint sets
  • hierarchical clustering in which a nested tree of partitions is created

The following is an example of using the famous k-means algorithm. The example is taken from [5]

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()  # for plot styling
import numpy as np
from sklearn.datasets import make_blobs
X, y_true = make_blobs(n_samples=300, centers=4,
                       cluster_std=0.60, random_state=0)
plt.scatter(X[:, 0], X[:, 1], s=50);

png

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')

centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);

png

Regardless of the appraoch we use, the dominant feature in unsupervised learning in terms of data is that no labels are given to us that specify the category of the data points in the data set. This point makes hard to evaluate the quality of the output of a given method. Clustering algorihms typically require that we specify a way to measure the similarity between two items.

The other facet of unsupervised learning is dimensionality reduction. Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

Reinforcement learning paradigm

Reinforcement learning is a different paradigm altogether. Unlike the other two learning frameworks, which operate using a static dataset, reinforcement learning works with data from a dynamic environment [1]. The goal is not to cluster data or label data, but to find the best sequence of actions that will generate the optimal outcome. The way reinforcement learning solves this problem is by allowing an agent to explore, interact with, and learn from the environment [1]. In particualr, the agent gets to learn how to map situations to actions [1]. It does so by maximizing a numerical reward signal. The important thing here is that the agent is not is not told how to maximize the reward signal i.e. which actions to take [1]. Instead, it must discover which actions yield the most reward by trying them [1].

Let’s try to get more in depth. A reinforcement learing problem consists of the following core elements

  • an agent (although multiple agents can also exist)
  • an environment over which the agent acts
  • a state space which can be continious or discrete
  • an action space which can also be continious or discrete
  • a policy $\pi$ that the agent is using in order to select actions so that it is agle to guide itself through the environment
  • A reward function $R(s, \alpha)$

TODO Examples of rewards

The policy function $\pi$ function takes in state observations (the inputs) and maps them to actions (the outputs). Thus, the policy decides which action to take. The goal of the agent is to learn the best policy as it interacts with the environment so that, given any state, it will always take the most optimal action i.e. the one that will produce the most reward in the long run [1].

The following sample code from OpenAI.Gym depicts programmatically the agent-environment interaction

import gym
# create the environment
env = gym.make('CartPole-v0')

# for a given number of episodes
for i_episode in range(5):
    
    # observe the environme i.e. get
    # its current state
    observation = env.reset()
    
    # for a given number of iterations
    for t in range(3):
       
        # somehow decide for an action
        action = env.action_space.sample()
        
        
        # execute the action
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.close()

Reinforcement learning is a powerful learning paradigm used in a variety of tasks for example

  • Quantum error correction [6]
  • Robotics
  • Control
  • Game playing

In this series of notes we will look into various algorithms for reinforecement learning. Specifically, in the first part of the notes, we study methods basd on some sort of tabular apporximation for the value functions involved. Namely, we will look into dynamic programming, Monte Carlo and temporal differencing or TD methods. These methods are, in general, easy to implement. However, when the state space becomes large we need to look into approximate methods as the former approaches are not adequate. This is done in the second part of the book. Approximate solution methods attempt to approximate a function. In this sense, we can use a lot of the machinery developed in the supervised learning research. Part three looks into algorithms that employ deep neural networks. These methods are gaining, if not already established, momentum in the are of reinforcement learning. Parts four and five look into various applications of reinforcement learning such as robotics, control and fluid mechanics. The last part is devoted to quantum reinforcement learning and the advances that have already been achieved in this field.

In a nutshell, reinforcement learning is a subfield of machine learning concerned with decision making and how an agent can learn how to achieve goals in a complex, uncertain environment. Let’s try to clarify this with a simple example.

Consider robot navigation in a maze. The robot is the agent and the maze is the environment it interacts with. There is a well defined goal for the agent to achieve e.g getting out of the maze. The agent takes an action e.g. makes one step or a few steps forward or backward or at a cross road decides whether it should turn left or right takes an action. For this action the agent receives a signal or a reward from the environment as an evaluation of how good or bad its previous action was towards the final goal. The second input that the agent receives from the environment is the new state of the latter.

An important point to remark is that the agent, in general, should be non-greedy. That is the agent should not act as to improve performance in the short term by accumulating as much reward as possible. In contrast, it should act in order to optimize the long-term goal.

The following video gives a nice bird’s eye view of what reinforcement learning is.

A more thorough introduction can be found in the following video

References

  1. Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction.
  2. Linear regression
  3. k-nearest neighbors.
  4. Kevin P. Murphy, Machine Learning A Probabilistic Perspective, The MIT Press.
  5. Python Data Science Handbook
  6. Hendrik Poulsen Nautrup et al Optimizing Quantum Error Correction Codes with Reinforcement Learning