|
bitrl & cuberl Documentation
Simulation engine for reinforcement learning agents
|
The PolicyImprovement class. PolicyImprovement is not a real algorithm in the sense that it looks for a policy. Instead, it is more of a helper function that allows as to improve on a given policy. More...
#include <policy_improvement.h>


Public Types | |
| typedef DPSolverBase< EnvType >::env_type | env_type |
| env_t | |
| typedef PolicyType | policy_type |
| policy_type | |
Public Types inherited from cuberl::rl::algos::dp::DPSolverBase< EnvType > | |
| typedef RLSolverBase< EnvType >::env_type | env_type |
| The environment type the solver is using. | |
Public Types inherited from cuberl::rl::algos::RLSolverBase< EnvType > | |
| typedef EnvType | env_type |
Public Member Functions | |
| PolicyImprovement (uint_t action_space_size, real_t gamma, const DynVec< real_t > &val_func, policy_type &policy) | |
| IterativePolicyEval. | |
| virtual void | actions_before_training_begins (env_type &) override |
| actions_before_training_begins. Execute any actions the algorithm needs before starting the iterations | |
| virtual void | actions_after_training_ends (env_type &) override |
| actions_after_training_ends. Actions to execute after the training iterations have finisehd | |
| virtual void | actions_before_episode_begins (env_type &, uint_t) override |
| actions_before_training_episode | |
| virtual void | actions_after_episode_ends (env_type &, uint_t, const EpisodeInfo &) override |
| actions_after_training_episode | |
| virtual EpisodeInfo | on_training_episode (env_type &env, uint_t episode_idx) override |
| on_episode Do one on_episode of the algorithm | |
| const policy_type & | policy () const |
| policy | |
| policy_type & | policy () |
| policy | |
| void | set_value_function (const DynVec< real_t > &v) |
| set_value_function | |
Public Member Functions inherited from cuberl::rl::algos::dp::DPSolverBase< EnvType > | |
| virtual | ~DPSolverBase ()=default |
| Destructor. | |
Public Member Functions inherited from cuberl::rl::algos::RLSolverBase< EnvType > | |
| virtual | ~RLSolverBase ()=default |
| Destructor. | |
Protected Attributes | |
| real_t | gamma_ |
| gamma_ | |
| DynVec< real_t > | v_ |
| v_ | |
| policy_type & | policy_ |
| policy_ | |
| cuberl::rl::policies::StochasticAdaptorPolicy< policy_type > | policy_adaptor_ |
| How to adapt the policy. | |
Additional Inherited Members | |
Protected Member Functions inherited from cuberl::rl::algos::dp::DPSolverBase< EnvType > | |
| DPSolverBase ()=default | |
| DPAlgoBase. | |
Protected Member Functions inherited from cuberl::rl::algos::RLSolverBase< EnvType > | |
| RLSolverBase ()=default | |
| Constructor. | |
The PolicyImprovement class. PolicyImprovement is not a real algorithm in the sense that it looks for a policy. Instead, it is more of a helper function that allows as to improve on a given policy.
| typedef DPSolverBase<EnvType>::env_type cuberl::rl::algos::dp::PolicyImprovement< EnvType, PolicyType >::env_type |
env_t
| typedef PolicyType cuberl::rl::algos::dp::PolicyImprovement< EnvType, PolicyType >::policy_type |
policy_type
| cuberl::rl::algos::dp::PolicyImprovement< EnvType, PolicyType >::PolicyImprovement | ( | uint_t | action_space_size, |
| real_t | gamma, | ||
| const DynVec< real_t > & | val_func, | ||
| policy_type & | policy | ||
| ) |
IterativePolicyEval.
|
inlineoverridevirtual |
actions_after_training_episode
Reimplemented from cuberl::rl::algos::RLSolverBase< EnvType >.
|
inlineoverridevirtual |
actions_after_training_ends. Actions to execute after the training iterations have finisehd
Implements cuberl::rl::algos::RLSolverBase< EnvType >.
|
inlineoverridevirtual |
actions_before_training_episode
Reimplemented from cuberl::rl::algos::RLSolverBase< EnvType >.
|
inlineoverridevirtual |
actions_before_training_begins. Execute any actions the algorithm needs before starting the iterations
Implements cuberl::rl::algos::RLSolverBase< EnvType >.
|
overridevirtual |
on_episode Do one on_episode of the algorithm
Implements cuberl::rl::algos::RLSolverBase< EnvType >.
|
inline |
policy
|
inline |
policy
|
inline |
set_value_function
| v |
|
protected |
gamma_
|
protected |
policy_
|
protected |
How to adapt the policy.
|
protected |
v_