Namespaces
namespace	boards

namespace	consts

namespace	control

namespace	dynamics

namespace	envs

namespace	estimation

namespace	network

namespace	rigid_bodies

namespace	sensors

namespace	utils

Classes
struct	ActiveBoundaryObject

class	FilteredIterator
	Simple wrapper to boost::filter_iterator. More...

struct	IntegralRange
	A range of integer values in [s, e]. More...

struct	IsActive

struct	NotNull

struct	Null
	Null placeholder. More...

struct	RealRange
	A range of double precision floating point values. More...

class	TimeStep
	Forward declaration. More...

struct	TimeStepEnumUtils
	Utilities for TimeStepTp. More...

class	VectorTimeStep
	Forward declaration. More...

Typedefs
typedef double	real_t
	real_t

typedef float	float_t
	float

typedef int	int_t
	integer type

typedef long int	lint_t
	long int type

typedef std::size_t	uint_t
	uint_t

template<typename T >
using	DynMat = Eigen::MatrixX< T >
	Dynamically sized matrix to use around the library.

template<typename T , uint_t N>
using	SquareMat = Eigen::Matrix< T, N, N >
	Square matrix with elements of type T.

template<typename T , uint_t N, uint_t M>
using	Mat = Eigen::Matrix< T, N, M >
	General fixed size matrix.

using	RealMat3d = Eigen::Matrix3< real_t >
	Dynamic×3 matrix of type double.

using	FloatMat3d = Eigen::Matrix3< float_t >
	Dynamic×3 matrix of type float.

template<typename T >
using	DynVec = Eigen::RowVectorX< T >
	Dynamically sized row vector.

using	FloatVec = DynVec< float_t >
	single precision floating point vector

using	RealVec = DynVec< real_t >
	double precision floating point vector

using	STD_FloatVec = std::vector< float_t >
	single precision std::vector

using	STD_RealVec = std::vector< real_t >
	double precision std::vector

template<typename T >
using	ColVec = Eigen::VectorX< T >
	Column vector. Some maths operations are easier using column vectors rather than DynVec.

using	RealColVec = ColVec< real_t >
	Dynamically sized column vector.

using	FoatColVec = ColVec< float_t >
	Dynamically sized column vector.

using	RealColVec3d = Eigen::Vector3d
	3D column vector

using	FloatColVec3d = Eigen::Vector3f
	3D column vectpr

Enumerations
enum class	DeviceType { INVALID_TYPE = 0 , CPU = 1 , GPU = 2 }
	Enumeration of various device types. More...

enum class	TimeStepTp : uint_t { FIRST = 0 , MID = 1 , LAST = 2 , INVALID_TYPE = 3 }
	The TimeStepTp enum. More...

Functions
std::ostream &	operator<< (std::ostream &out, const Null &)

template<typename StateTp >
std::ostream &	operator<< (std::ostream &out, const TimeStep< StateTp > &step)

template<typename T >
std::ostream &	operator<< (std::ostream &out, const TimeStep< std::vector< T > > &step)

template<typename StateTp >
std::ostream &	operator<< (std::ostream &out, const VectorTimeStep< StateTp > &step)

template<typename Pred , typename Type >
bool	operator== (const FilteredIterator< Pred, Type > &lhs, const FilteredIterator< Pred, Type > &rhs)

template<typename Pred , typename Type >
bool	operator!= (const FilteredIterator< Pred, Type > &lhs, const FilteredIterator< Pred, Type > &rhs)

std::ostream &	operator<< (std::ostream &out, const std::chrono::system_clock::time_point &tp)

template<typename T >
std::ostream &	operator<< (std::ostream &out, const std::vector< T > &obs)

Detailed Description

Implements the Gridworld environment from the book Deep Reinforcement Learning in Action by Manning publications. You can find the original environment here: https://github.com/DeepReinforcementLearning/DeepReinforcementLearningInAction

Description

The Acrobot environment is based on Sutton's work in "Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding" and Sutton and Barto's book. The system consists of two links connected linearly to form a chain, with one end of the chain fixed. The joint between the two links is actuated. The goal is to apply torques on the actuated joint to swing the free end of the linear chain above a given height while starting from the initial state of hanging downwards.

As seen in the Gif: two blue links connected by two green joints. The joint in between the two links is actuated. The goal is to swing the free end of the outer-link to reach the target height (black horizontal line above system) by applying torque on the actuator.

Action Space

The action is discrete, deterministic, and represents the torque applied on the actuated joint between the two links.

Num	Action	Unit
0	apply -1 torque to the actuated joint	torque (N m)
1	apply 0 torque to the actuated joint	torque (N m)
2	apply 1 torque to the actuated joint	torque (N m)

Observation Space

The observation is a ndarray with shape (6,) that provides information about the two rotational joint angles as well as their angular velocities:

Num	Observation	Min	Max
0	Cosine of `theta1`	-1	1
1	Sine of `theta1`	-1	1
2	Cosine of `theta2`	-1	1
3	Sine of `theta2`	-1	1
4	Angular velocity of `theta1`	~ -12.567 (-4 * pi)	~ 12.567 (4 * pi)
5	Angular velocity of `theta2`	~ -28.274 (-9 * pi)	~ 28.274 (9 * pi)

where

theta1 is the angle of the first joint, where an angle of 0 indicates the first link is pointing directly downwards.
theta2 is relative to the angle of the first link. An angle of 0 corresponds to having the same angle between the two links.

The angular velocities of theta1 and theta2 are bounded at ±4π, and ±9π rad/s respectively. A state of [1, 0, 1, 0, ..., ...] indicates that both links are pointing downwards.

Rewards

The goal is to have the free end reach a designated target height in as few steps as possible, and as such all steps that do not reach the goal incur a reward of -1. Achieving the target height results in termination with a reward of 0. The reward threshold is -100.

Starting State

Each parameter in the underlying state (theta1, theta2, and the two angular velocities) is initialized uniformly between -0.1 and 0.1. This means both links are pointing downwards with some initial stochasticity.

Episode End

The episode ends if one of the following occurs:

Termination: The free end reaches the target height, which is constructed as: -cos(theta1) - cos(theta2 + theta1) > 1.0
Truncation: Episode length is greater than 500 (200 for v0)

Arguments

No additional arguments are currently supported.

env = gym.make('Acrobot-v1')

By default, the dynamics of the acrobot follow those described in Sutton and Barto's book Reinforcement Learning: An Introduction. However, a book_or_nips parameter can be modified to change the pendulum dynamics to those described in the original NeurIPS paper.

# To change the dynamics as described above

env.env.book_or_nips = 'nips'

See the following note and the implementation for details:

‍The dynamics equations were missing some terms in the NIPS paper which are present in the book. R. Sutton confirmed in personal correspondence that the experimental results shown in the paper and the book were generated with the equations shown in the book. However, there is the option to run the domain with the paper equations by setting ‘book_or_nips = 'nips’`

Version History

v1: Maximum number of steps increased from 200 to 500. The observation space for v0 provided direct readings of theta1 and theta2 in radians, having a range of [-pi, pi]. The v1 observation space as described here provides the sine and cosine of each angle instead.
v0: Initial versions release (1.0.0) (removed from gym for v1)

References

Sutton, R. S. (1996). Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. In D. Touretzky, M. C. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8). MIT Press. https://proceedings.neurips.cc/paper/1995/file/8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf
Sutton, R. S., Barto, A. G. (2018 ). Reinforcement Learning: An Introduction. The MIT Press. """ CartPole environment. The original environment is described here: https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py The state variables are: Observation: Type: Box(4) Num Observation Min Max 0 Cart Position -2.4 2.4 1 Cart Velocity -Inf Inf 2 Pole Angle -0.209 rad (-12 deg) 0.209 rad (12 deg) 3 Pole Angular Velocity -Inf Inf Actions: Type: Discrete(2) Num Action 0 Push cart to the left 1 Push cart to the right Note: The amount the velocity that is reduced or increased is not fixed; it depends on the angle the pole is pointing. This is because the center of gravity of the pole increases the amount of energy needed to move the cart underneath it Reward: Reward is 1 for every step taken, including the termination step Starting State: All observations are assigned a uniform random value in [-0.05..0.05] Episode Termination: Pole Angle is more than 12 degrees. Cart Position is more than 2.4 (center of the cart reaches the edge of the display). Episode length is greater than 200. Solved Requirements: Considered solved when the average return is greater than or equal to 195.0 over 100 consecutive trials. Pendulum environment. The original environment is described here: https://github.com/openai/gym/blob/master/gym/envs/classic_control/pendulum.py The state variables are: @iverbatim ### Description @endiverbatim The inverted pendulum swingup problem is based on the classic problem in control theory. The system consists of a pendulum attached at one end to a fixed point, and the other end being free. The pendulum starts in a random position and the goal is to apply torque on the free end to swing it into an upright position, with its center of gravity right above the fixed point. The diagram below specifies the coordinate system used for the implementation of the pendulum's dynamic equations. <img src="./diagrams/pendulum.png" alt="Pendulum Coordinate System"/> - <tt>x-y</tt>: cartesian coordinates of the pendulum's end in meters. - <tt>theta</tt> : angle in radians. - <tt>tau</tt>: torque in <tt>N m</tt>. Defined as positive <em>counter-clockwise</em>. @subsubsection autotoc_md63 Action Space The action is a <tt>ndarray</tt> with shape <tt>(1,)</tt> representing the torque applied to free end of the pendulum. <table class="markdownTable"> <tr class="markdownTableHead"> <th class="markdownTableHeadNone"> Num

Action

Min

Max

0

Torque

-2.0

2.0

Observation Space

The observation is a ndarray with shape (3,) representing the x-y coordinates of the pendulum's free end and its angular velocity.

Num	Observation	Min	Max
0	x = cos(theta)	-1.0	1.0
1	y = sin(theta)	-1.0	1.0
2	Angular Velocity	-8.0	8.0

Rewards

The reward function is defined as:

r = -(theta² + 0.1 * theta_dt² + 0.001 * torque²)*

where $\theta$ is the pendulum's angle normalized between [-pi, pi] (with 0 being in the upright position). Based on the above equation, the minimum reward that can be obtained is -(pi² + 0.1 * 8² + 0.001 * 2²) = -16.2736044*, while the maximum reward is zero (pendulum is upright with zero velocity and no torque applied).

Starting State

The starting state is a random angle in [-pi, pi] and a random angular velocity in [-1,1].

Episode Truncation

The episode truncates at 200 time steps.

Arguments

g: acceleration of gravity measured in *(m s^-2)* used to calculate the pendulum dynamics. The default value is g = 10.0 .

gym.make('Pendulum-v1', g=9.81)

Version History

v1: Simplify the math equations, no difference in behavior. v0: Initial versions release (1.0.0).

Vector Acrobot environment. This class simply wraps copies of the Acrobot class. See: https://github.com/pockerman/rlenvs_from_cpp/blob/master/src/rlenvs/envs/gymnasium/classic_control/acrobot_env.h for more information

Base class for Gymnasium vector environments. See: https://gymnasium.farama.org/api/vector/sync_vector_env/

BlackJack environment https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/toy_text/blackjack.py

This is a simple implementation of the Gridworld Cliff einforcement learning task.

Description

The board is a 4x12 matrix, with (using NumPy matrix indexing):

[3, 0] as the start at bottom-left
[3, 11] as the goal at bottom-right
[3, 1..10] as the cliff at bottom-center

If the agent steps on the cliff it returns to the start. An episode terminates when the agent reaches the goal.

Actions

There are 4 discrete deterministic actions:

0: move up
1: move right
2: move down
3: move left

Observations

There are 3x12 + 1 possible states. In fact, the agent cannot be at the cliff, nor at the goal (as this results the end of episode). They remain all the positions of the first 3 rows plus the bottom-left cell. The observation is simply the current position encoded as flattened index.

Reward

Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward.

Wrapper to the FrozenLake OpenAI-Gym environment. The origina environment can be found at: https://github.com/openai/gym/blob/master/gym/envs/toy_text/frozen_lake.py Frozen lake involves crossing a frozen lake from Start(S) to goal(G) without falling into any holes(H). The agent may not always move in the intended direction due to the slippery nature of the frozen lake

The agent take a 1-element vector for actions. The action space is (dir), where dir decides direction to move in which can be:

0: LEFT
1: DOWN
2: RIGHT
3: UP

The observation is a value representing the agents current position as current_row * nrows + current_col

Reward schedule:

Reach goal(G): +1
Reach hole(H): 0

Arguments

gym.make('FrozenLake-v0', desc=None,map_name="4x4", is_slippery=True)

desc: Used to specify custom map for frozen lake. For example, desc=["SFFF", "FHFH", "FFFH", "HFFG"]. map_name: ID to use any of the preloaded maps. "4x4":[ "SFFF", "FHFH", "FFFH", "HFFG" ] "8x8": [ "SFFFFFFF", "FFFFFFFF", "FFFHFFFF", "FFFFFHFF", "FFFHFFFF", "FHHFFFHF", "FHFFHFHF", "FFFHFFFG", ] is_slippery: True/False. If True will move in intended direction with probability of 1/3 else will move in either perpendicular direction with equal probability of 1/3 in both directions. For example, if action is left and is_slippery is True, then:

P(move left)=1/3
P(move up)=1/3
P(move down)=1/3

Typedef Documentation

◆ ColVec

template<typename T >

using bitrl::ColVec = typedef Eigen::VectorX<T>

Column vector. Some maths operations are easier using column vectors rather than DynVec.

◆ DynMat

template<typename T >

using bitrl::DynMat = typedef Eigen::MatrixX<T>

Dynamically sized matrix to use around the library.

◆ DynVec

template<typename T >

using bitrl::DynVec = typedef Eigen::RowVectorX<T>

Dynamically sized row vector.

◆ float_t

typedef float bitrl::float_t

float

◆ FloatColVec3d

using bitrl::FloatColVec3d = typedef Eigen::Vector3f

3D column vectpr

◆ FloatMat3d

using bitrl::FloatMat3d = typedef Eigen::Matrix3<float_t>

Dynamic×3 matrix of type float.

◆ FloatVec

using bitrl::FloatVec = typedef DynVec<float_t>

single precision floating point vector

◆ FoatColVec

using bitrl::FoatColVec = typedef ColVec<float_t>

Dynamically sized column vector.

◆ int_t

typedef int bitrl::int_t

integer type

◆ lint_t

typedef long int bitrl::lint_t

long int type

◆ Mat

template<typename T , uint_t N, uint_t M>

using bitrl::Mat = typedef Eigen::Matrix<T, N, M>

General fixed size matrix.

◆ real_t

typedef double bitrl::real_t

real_t

◆ RealColVec

using bitrl::RealColVec = typedef ColVec<real_t>

Dynamically sized column vector.

◆ RealColVec3d

using bitrl::RealColVec3d = typedef Eigen::Vector3d

3D column vector

◆ RealMat3d

using bitrl::RealMat3d = typedef Eigen::Matrix3<real_t>

Dynamic×3 matrix of type double.

◆ RealVec

using bitrl::RealVec = typedef DynVec<real_t>

double precision floating point vector

◆ SquareMat

template<typename T , uint_t N>

using bitrl::SquareMat = typedef Eigen::Matrix<T, N, N>

Square matrix with elements of type T.

◆ STD_FloatVec

using bitrl::STD_FloatVec = typedef std::vector<float_t>

single precision std::vector

◆ STD_RealVec

using bitrl::STD_RealVec = typedef std::vector<real_t>

double precision std::vector

◆ uint_t

typedef std::size_t bitrl::uint_t

uint_t

Enumeration Type Documentation

◆ DeviceType

enum class bitrl::DeviceType

strong

Enumeration of various device types.

Enumerator
INVALID_TYPE
CPU
GPU

◆ TimeStepTp

enum class bitrl::TimeStepTp : uint_t

strong

The TimeStepTp enum.

Enumerator
FIRST
MID
LAST
INVALID_TYPE

Function Documentation

◆ operator!=()

template<typename Pred , typename Type >

bool bitrl::operator!=	(	const FilteredIterator< Pred, Type > &	lhs,
		const FilteredIterator< Pred, Type > &	rhs
	)

inline

◆ operator<<() [1/6]

std::ostream & bitrl::operator<<	(	std::ostream &	out,
		const Null &
	)

inline

◆ operator<<() [2/6]

std::ostream & bitrl::operator<<	(	std::ostream &	out,
		const std::chrono::system_clock::time_point &	tp
	)

inline

◆ operator<<() [3/6]

template<typename T >

std::ostream & bitrl::operator<<	(	std::ostream &	out,
		const std::vector< T > &	obs
	)

Template Parameters

T	The type of the value to print

Parameters

out	The stream to write on
obs	The values to print on out

Returns: Read/write reference to the stream

◆ operator<<() [4/6]

template<typename StateTp >

std::ostream & bitrl::operator<<	(	std::ostream &	out,
		const TimeStep< StateTp > &	step
	)

inline

◆ operator<<() [5/6]

template<typename T >

std::ostream & bitrl::operator<<	(	std::ostream &	out,
		const TimeStep< std::vector< T > > &	step
	)

◆ operator<<() [6/6]

template<typename StateTp >

std::ostream & bitrl::operator<<	(	std::ostream &	out,
		const VectorTimeStep< StateTp > &	step
	)

inline

◆ operator==()

template<typename Pred , typename Type >

bool bitrl::operator==	(	const FilteredIterator< Pred, Type > &	lhs,
		const FilteredIterator< Pred, Type > &	rhs
	)

inline

Namespaces

Classes

Typedefs

Enumerations

Functions

Detailed Description

Description

Action Space

Observation Space

Rewards

Starting State

Episode End

Arguments

Version History

References

Observation Space

Rewards

Starting State

Episode Truncation

Arguments

Version History

Description

Actions

Observations

Reward

Arguments

Typedef Documentation

◆ ColVec

◆ DynMat

◆ DynVec

◆ float_t

◆ FloatColVec3d

◆ FloatMat3d

◆ FloatVec

◆ FoatColVec

◆ int_t

◆ lint_t

◆ Mat

◆ real_t

◆ RealColVec

◆ RealColVec3d

◆ RealMat3d

◆ RealVec

◆ SquareMat

◆ STD_FloatVec

◆ STD_RealVec

◆ uint_t

Enumeration Type Documentation

◆ DeviceType

◆ TimeStepTp

Function Documentation

◆ operator!=()

◆ operator<<() [1/6]

◆ operator<<() [2/6]

◆ operator<<() [3/6]

◆ operator<<() [4/6]

◆ operator<<() [5/6]

◆ operator<<() [6/6]

◆ operator==()