bitrl & cuberl Documentation
Simulation engine for reinforcement learning agents
Loading...
Searching...
No Matches
softmax_policy.h
Go to the documentation of this file.
1#ifndef SOFTMAX_POLICY_H
2#define SOFTMAX_POLICY_H
3
4
8
9
10namespace cuberl {
11namespace rl {
12namespace policies {
13
18{
19public:
20
25
30
34 template<typename MatType>
35 output_type operator()(const MatType& q_map, uint_t state_idx)const;
36
42 template<typename VecTp>
43 output_type operator()(const VecTp& q_map)const;
44
49 void on_episode(uint_t)noexcept{}
50
54 void reset()noexcept{}
55
56private:
57
61 real_t tau_;
62
63 MaxTabularPolicy max_policy_;
64
65};
66
67inline
72
73template<typename VecTp>
75MaxTabularSoftmaxPolicy::operator()(const VecTp& q_map)const{
76
77 auto softmax_vec = maths::softmax_vec(q_map.begin(), q_map.end(), tau_);
78 return max_policy_.get_action(softmax_vec);
79}
80
81}
82}
83}
84
85#endif // SOFTMAX_POLICY_H
class MaxTabularPolicy
Definition max_tabular_policy.h:30
static output_type get_action(const MatType &q_map, uint_t state_idx)
get_action. Given a
Definition softmax_policy.h:18
void on_episode(uint_t) noexcept
any actions the policy should perform on the given episode index
Definition softmax_policy.h:49
void reset() noexcept
Reset the policy.
Definition softmax_policy.h:54
MaxTabularSoftmaxPolicy(real_t tau=1.0)
Constructor.
Definition softmax_policy.h:68
uint_t output_type
The output type of operator()
Definition softmax_policy.h:24
output_type operator()(const MatType &q_map, uint_t state_idx) const
operator(). Given a
double real_t
real_t
Definition bitrl_types.h:23
std::size_t uint_t
uint_t
Definition bitrl_types.h:43
std::vector< T > softmax_vec(const std::vector< T > &vec, real_t tau=1.0)
applies softmax operation to the elements of the vector and returns a vector with the result
Definition vector_math.h:342
Various utilities used when working with RL problems.
Definition cuberl_types.h:16