The QLearning class. Table based implementation of the Q-learning algorithm using epsilon-greedy policy. The implementation also allows for exponential decay of the used epsilon. More...

#include <q_learning.h>

Inheritance diagram for cuberl::rl::algos::td::QLearningSolver< EnvTp, PolicyType >:

[legend]

Collaboration diagram for cuberl::rl::algos::td::QLearningSolver< EnvTp, PolicyType >:

[legend]

Public Types
typedef TDAlgoBase< EnvTp >::env_type	env_type
	env_t

typedef TDAlgoBase< EnvTp >::action_type	action_type
	action_t

typedef TDAlgoBase< EnvTp >::state_type	state_type
	state_t

typedef PolicyType	policy_type
	action_selector_t

Public Types inherited from cuberl::rl::algos::td::TDAlgoBase< EnvTp >
typedef EnvTp	env_type
	env_t

typedef env_type::action_type	action_type
	action_t

typedef env_type::state_type	state_type
	state_t

Public Types inherited from cuberl::rl::algos::RLSolverBase< EnvType >
typedef EnvType	env_type

Public Member Functions
	QLearningSolver (const QLearningConfig config, const PolicyType &policy)
	Constructor.

virtual void	actions_before_training_begins (env_type &)
	actions_before_training_begins. Execute any actions the algorithm needs before starting the iterations

virtual void	actions_after_training_ends (env_type &)
	actions_after_training_ends. Actions to execute after the training iterations have finisehd

virtual void	actions_before_episode_begins (env_type &, uint_t)
	actions_before_training_episode

virtual void	actions_after_episode_ends (env_type &, uint_t episode_idx, const EpisodeInfo &)
	actions_after_training_episode

virtual EpisodeInfo	on_training_episode (env_type &, uint_t episode_idx)
	on_episode Do one on_episode of the algorithm

void	save (const std::string &filename) const
	Save the state-action function in a CSV format.

cuberl::rl::policies::MaxTabularPolicy	build_policy () const
	Build the policy after training.

Public Member Functions inherited from cuberl::rl::algos::td::TDAlgoBase< EnvTp >
virtual	~TDAlgoBase ()=default
	Destructor.

Public Member Functions inherited from cuberl::rl::algos::RLSolverBase< EnvType >
virtual	~RLSolverBase ()=default
	Destructor.

virtual void	actions_before_training_begins (env_type &)=0
	actions_before_training_begins. Execute any actions the algorithm needs before starting the iterations

virtual void	actions_after_training_ends (env_type &)=0
	actions_after_training_ends. Actions to execute after the training iterations have finisehd

virtual void	actions_before_episode_begins (env_type &, uint_t)
	actions_before_training_episode

virtual void	actions_after_episode_ends (env_type &, uint_t, const EpisodeInfo &)
	actions_after_training_episode

virtual EpisodeInfo	on_training_episode (env_type &, uint_t)=0
	on_episode Do one on_episode of the algorithm

Additional Inherited Members
Protected Member Functions inherited from cuberl::rl::algos::td::TDAlgoBase< EnvTp >
	TDAlgoBase ()=default
	DPAlgoBase.

Protected Member Functions inherited from cuberl::rl::algos::RLSolverBase< EnvType >
	RLSolverBase ()=default
	Constructor.

Detailed Description

template<envs::discrete_world_concept EnvTp, typename PolicyType>
class cuberl::rl::algos::td::QLearningSolver< EnvTp, PolicyType >

The QLearning class. Table based implementation of the Q-learning algorithm using epsilon-greedy policy. The implementation also allows for exponential decay of the used epsilon.