|
bitrl & cuberl Documentation
Simulation engine for reinforcement learning agents
|
Namespaces | |
| namespace | boards |
| namespace | consts |
| namespace | control |
| namespace | dynamics |
| namespace | envs |
| namespace | estimation |
| namespace | network |
| namespace | rigid_bodies |
| namespace | sensors |
| namespace | utils |
Classes | |
| struct | ActiveBoundaryObject |
| class | FilteredIterator |
| Simple wrapper to boost::filter_iterator. More... | |
| struct | IntegralRange |
| A range of integer values in [s, e]. More... | |
| struct | IsActive |
| struct | NotNull |
| struct | Null |
| Null placeholder. More... | |
| struct | RealRange |
| A range of double precision floating point values. More... | |
| class | TimeStep |
| Forward declaration. More... | |
| struct | TimeStepEnumUtils |
| Utilities for TimeStepTp. More... | |
| class | VectorTimeStep |
| Forward declaration. More... | |
Typedefs | |
| typedef double | real_t |
| real_t | |
| typedef float | float_t |
| float | |
| typedef int | int_t |
| integer type | |
| typedef long int | lint_t |
| long int type | |
| typedef std::size_t | uint_t |
| uint_t | |
| template<typename T > | |
| using | DynMat = Eigen::MatrixX< T > |
| Dynamically sized matrix to use around the library. | |
| template<typename T , uint_t N> | |
| using | SquareMat = Eigen::Matrix< T, N, N > |
| Square matrix with elements of type T. | |
| template<typename T , uint_t N, uint_t M> | |
| using | Mat = Eigen::Matrix< T, N, M > |
| General fixed size matrix. | |
| using | RealMat3d = Eigen::Matrix3< real_t > |
| Dynamic×3 matrix of type double. | |
| using | FloatMat3d = Eigen::Matrix3< float_t > |
| Dynamic×3 matrix of type float. | |
| template<typename T > | |
| using | DynVec = Eigen::RowVectorX< T > |
| Dynamically sized row vector. | |
| using | FloatVec = DynVec< float_t > |
| single precision floating point vector | |
| using | RealVec = DynVec< real_t > |
| double precision floating point vector | |
| using | STD_FloatVec = std::vector< float_t > |
| single precision std::vector | |
| using | STD_RealVec = std::vector< real_t > |
| double precision std::vector | |
| template<typename T > | |
| using | ColVec = Eigen::VectorX< T > |
| Column vector. Some maths operations are easier using column vectors rather than DynVec. | |
| using | RealColVec = ColVec< real_t > |
| Dynamically sized column vector. | |
| using | FoatColVec = ColVec< float_t > |
| Dynamically sized column vector. | |
| using | RealColVec3d = Eigen::Vector3d |
| 3D column vector | |
| using | FloatColVec3d = Eigen::Vector3f |
| 3D column vectpr | |
Enumerations | |
| enum class | DeviceType { INVALID_TYPE = 0 , CPU = 1 , GPU = 2 } |
| Enumeration of various device types. More... | |
| enum class | TimeStepTp : uint_t { FIRST = 0 , MID = 1 , LAST = 2 , INVALID_TYPE = 3 } |
| The TimeStepTp enum. More... | |
Functions | |
| std::ostream & | operator<< (std::ostream &out, const Null &) |
| template<typename StateTp > | |
| std::ostream & | operator<< (std::ostream &out, const TimeStep< StateTp > &step) |
| template<typename T > | |
| std::ostream & | operator<< (std::ostream &out, const TimeStep< std::vector< T > > &step) |
| template<typename StateTp > | |
| std::ostream & | operator<< (std::ostream &out, const VectorTimeStep< StateTp > &step) |
| template<typename Pred , typename Type > | |
| bool | operator== (const FilteredIterator< Pred, Type > &lhs, const FilteredIterator< Pred, Type > &rhs) |
| template<typename Pred , typename Type > | |
| bool | operator!= (const FilteredIterator< Pred, Type > &lhs, const FilteredIterator< Pred, Type > &rhs) |
| std::ostream & | operator<< (std::ostream &out, const std::chrono::system_clock::time_point &tp) |
| template<typename T > | |
| std::ostream & | operator<< (std::ostream &out, const std::vector< T > &obs) |
Implements the Gridworld environment from the book Deep Reinforcement Learning in Action by Manning publications. You can find the original environment here: https://github.com/DeepReinforcementLearning/DeepReinforcementLearningInAction
The Acrobot environment is based on Sutton's work in "Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding" and Sutton and Barto's book. The system consists of two links connected linearly to form a chain, with one end of the chain fixed. The joint between the two links is actuated. The goal is to apply torques on the actuated joint to swing the free end of the linear chain above a given height while starting from the initial state of hanging downwards.
As seen in the Gif: two blue links connected by two green joints. The joint in between the two links is actuated. The goal is to swing the free end of the outer-link to reach the target height (black horizontal line above system) by applying torque on the actuator.
The action is discrete, deterministic, and represents the torque applied on the actuated joint between the two links.
| Num | Action | Unit |
|---|---|---|
| 0 | apply -1 torque to the actuated joint | torque (N m) |
| 1 | apply 0 torque to the actuated joint | torque (N m) |
| 2 | apply 1 torque to the actuated joint | torque (N m) |
The observation is a ndarray with shape (6,) that provides information about the two rotational joint angles as well as their angular velocities:
| Num | Observation | Min | Max |
|---|---|---|---|
| 0 | Cosine of theta1 | -1 | 1 |
| 1 | Sine of theta1 | -1 | 1 |
| 2 | Cosine of theta2 | -1 | 1 |
| 3 | Sine of theta2 | -1 | 1 |
| 4 | Angular velocity of theta1 | ~ -12.567 (-4 * pi) | ~ 12.567 (4 * pi) |
| 5 | Angular velocity of theta2 | ~ -28.274 (-9 * pi) | ~ 28.274 (9 * pi) |
where
theta1 is the angle of the first joint, where an angle of 0 indicates the first link is pointing directly downwards.theta2 is relative to the angle of the first link. An angle of 0 corresponds to having the same angle between the two links.The angular velocities of theta1 and theta2 are bounded at ±4π, and ±9π rad/s respectively. A state of [1, 0, 1, 0, ..., ...] indicates that both links are pointing downwards.
The goal is to have the free end reach a designated target height in as few steps as possible, and as such all steps that do not reach the goal incur a reward of -1. Achieving the target height results in termination with a reward of 0. The reward threshold is -100.
Each parameter in the underlying state (theta1, theta2, and the two angular velocities) is initialized uniformly between -0.1 and 0.1. This means both links are pointing downwards with some initial stochasticity.
The episode ends if one of the following occurs:
-cos(theta1) - cos(theta2 + theta1) > 1.0No additional arguments are currently supported.
By default, the dynamics of the acrobot follow those described in Sutton and Barto's book Reinforcement Learning: An Introduction. However, a book_or_nips parameter can be modified to change the pendulum dynamics to those described in the original NeurIPS paper.
See the following note and the implementation for details:
The dynamics equations were missing some terms in the NIPS paper which are present in the book. R. Sutton confirmed in personal correspondence that the experimental results shown in the paper and the book were generated with the equations shown in the book. However, there is the option to run the domain with the paper equations by setting ‘book_or_nips = 'nips’`
theta1 and theta2 in radians, having a range of [-pi, pi]. The v1 observation space as described here provides the sine and cosine of each angle instead.Action
Min
Max
0
Torque
-2.0
2.0
The observation is a ndarray with shape (3,) representing the x-y coordinates of the pendulum's free end and its angular velocity.
| Num | Observation | Min | Max |
|---|---|---|---|
| 0 | x = cos(theta) | -1.0 | 1.0 |
| 1 | y = sin(theta) | -1.0 | 1.0 |
| 2 | Angular Velocity | -8.0 | 8.0 |
The reward function is defined as:
r = -(theta2 + 0.1 * theta_dt2 + 0.001 * torque2)*
where $\theta$ is the pendulum's angle normalized between [-pi, pi] (with 0 being in the upright position). Based on the above equation, the minimum reward that can be obtained is -(pi2 + 0.1 * 82 + 0.001 * 22) = -16.2736044*, while the maximum reward is zero (pendulum is upright with zero velocity and no torque applied).
The starting state is a random angle in [-pi, pi] and a random angular velocity in [-1,1].
The episode truncates at 200 time steps.
g: acceleration of gravity measured in *(m s-2)* used to calculate the pendulum dynamics. The default value is g = 10.0 .v1: Simplify the math equations, no difference in behavior. v0: Initial versions release (1.0.0).
Vector Acrobot environment. This class simply wraps copies of the Acrobot class. See: https://github.com/pockerman/rlenvs_from_cpp/blob/master/src/rlenvs/envs/gymnasium/classic_control/acrobot_env.h for more information
Base class for Gymnasium vector environments. See: https://gymnasium.farama.org/api/vector/sync_vector_env/
BlackJack environment https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/toy_text/blackjack.py
This is a simple implementation of the Gridworld Cliff einforcement learning task.
The board is a 4x12 matrix, with (using NumPy matrix indexing):
If the agent steps on the cliff it returns to the start. An episode terminates when the agent reaches the goal.
There are 4 discrete deterministic actions:
There are 3x12 + 1 possible states. In fact, the agent cannot be at the cliff, nor at the goal (as this results the end of episode). They remain all the positions of the first 3 rows plus the bottom-left cell. The observation is simply the current position encoded as flattened index.
Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward.
Wrapper to the FrozenLake OpenAI-Gym environment. The origina environment can be found at: https://github.com/openai/gym/blob/master/gym/envs/toy_text/frozen_lake.py Frozen lake involves crossing a frozen lake from Start(S) to goal(G) without falling into any holes(H). The agent may not always move in the intended direction due to the slippery nature of the frozen lake
The agent take a 1-element vector for actions. The action space is (dir), where dir decides direction to move in which can be:
The observation is a value representing the agents current position as current_row * nrows + current_col
Reward schedule:
desc: Used to specify custom map for frozen lake. For example, desc=["SFFF", "FHFH", "FFFH", "HFFG"]. map_name: ID to use any of the preloaded maps. "4x4":[ "SFFF", "FHFH", "FFFH", "HFFG" ] "8x8": [ "SFFFFFFF", "FFFFFFFF", "FFFHFFFF", "FFFFFHFF", "FFFHFFFF", "FHHFFFHF", "FHFFHFHF", "FFFHFFFG", ] is_slippery: True/False. If True will move in intended direction with probability of 1/3 else will move in either perpendicular direction with equal probability of 1/3 in both directions. For example, if action is left and is_slippery is True, then:
| using bitrl::ColVec = typedef Eigen::VectorX<T> |
Column vector. Some maths operations are easier using column vectors rather than DynVec.
| using bitrl::DynMat = typedef Eigen::MatrixX<T> |
Dynamically sized matrix to use around the library.
| using bitrl::DynVec = typedef Eigen::RowVectorX<T> |
Dynamically sized row vector.
| typedef float bitrl::float_t |
float
| using bitrl::FloatColVec3d = typedef Eigen::Vector3f |
3D column vectpr
| using bitrl::FloatMat3d = typedef Eigen::Matrix3<float_t> |
Dynamic×3 matrix of type float.
| using bitrl::FloatVec = typedef DynVec<float_t> |
single precision floating point vector
| using bitrl::FoatColVec = typedef ColVec<float_t> |
Dynamically sized column vector.
| typedef int bitrl::int_t |
integer type
| typedef long int bitrl::lint_t |
long int type
| using bitrl::Mat = typedef Eigen::Matrix<T, N, M> |
General fixed size matrix.
| typedef double bitrl::real_t |
real_t
| using bitrl::RealColVec = typedef ColVec<real_t> |
Dynamically sized column vector.
| using bitrl::RealColVec3d = typedef Eigen::Vector3d |
3D column vector
| using bitrl::RealMat3d = typedef Eigen::Matrix3<real_t> |
Dynamic×3 matrix of type double.
| using bitrl::RealVec = typedef DynVec<real_t> |
double precision floating point vector
| using bitrl::SquareMat = typedef Eigen::Matrix<T, N, N> |
Square matrix with elements of type T.
| using bitrl::STD_FloatVec = typedef std::vector<float_t> |
single precision std::vector
| using bitrl::STD_RealVec = typedef std::vector<real_t> |
double precision std::vector
| typedef std::size_t bitrl::uint_t |
uint_t
|
strong |
|
strong |
|
inline |
|
inline |
|
inline |
| std::ostream & bitrl::operator<< | ( | std::ostream & | out, |
| const std::vector< T > & | obs | ||
| ) |
| T | The type of the value to print |
| out | The stream to write on |
| obs | The values to print on out |
|
inline |
| std::ostream & bitrl::operator<< | ( | std::ostream & | out, |
| const TimeStep< std::vector< T > > & | step | ||
| ) |
|
inline |
|
inline |