1#ifndef SOFTMAX_POLICY_H
2#define SOFTMAX_POLICY_H
34 template<
typename MatType>
42 template<
typename VecTp>
73template<
typename VecTp>
class MaxTabularPolicy
Definition max_tabular_policy.h:30
static output_type get_action(const MatType &q_map, uint_t state_idx)
get_action. Given a
Definition softmax_policy.h:18
void on_episode(uint_t) noexcept
any actions the policy should perform on the given episode index
Definition softmax_policy.h:49
void reset() noexcept
Reset the policy.
Definition softmax_policy.h:54
MaxTabularSoftmaxPolicy(real_t tau=1.0)
Constructor.
Definition softmax_policy.h:68
uint_t output_type
The output type of operator()
Definition softmax_policy.h:24
output_type operator()(const MatType &q_map, uint_t state_idx) const
operator(). Given a
double real_t
real_t
Definition bitrl_types.h:23
std::size_t uint_t
uint_t
Definition bitrl_types.h:43
std::vector< T > softmax_vec(const std::vector< T > &vec, real_t tau=1.0)
applies softmax operation to the elements of the vector and returns a vector with the result
Definition vector_math.h:342
Various utilities used when working with RL problems.
Definition cuberl_types.h:16