|
bitrl & cuberl Documentation
Simulation engine for reinforcement learning agents
|
Functions | |
| DynMat< real_t > | create_transition_matrix () |
| DynMat< real_t > | compute_matrix_power (const DynMat< real_t > &mat, uint_t power) |
| void | print_matrix (const DynMat< real_t > &mat) |
| real_t | get_reward (real_t prob, uint_t n=10) |
| void | update_record (std::vector< std::vector< real_t > > &records, uint_t action, real_t r) |
| uint_t | get_best_arm (const std::vector< std::vector< real_t > > &records) |
| std::vector< real_t > | get_probs (uint_t n) |
| DynVec< real_t > | extract_part (const std::vector< std::vector< real_t > > &values) |
Variables | |
| const uint_t | N = 10 |
| const auto | N_EXPERIMENTS = 500 |
| const auto | TAU = 0.7 |
| const uint | SEED = 42 |
Solve the multi-arm bandit problem using soft-max policy. When using a soft-max policy policy we get a distribution of probabilities over the actions. We select the action with the highest probability. For this example we will solve a 10-armed bandit problem, so N=10.
This example is taken from the book: Reinforcement Learning in Action by Manning Publications.
| DynMat< real_t > exe::compute_matrix_power | ( | const DynMat< real_t > & | mat, |
| uint_t | power | ||
| ) |
| DynMat< real_t > exe::create_transition_matrix | ( | ) |
| DynVec< real_t > exe::extract_part | ( | const std::vector< std::vector< real_t > > & | values | ) |
| uint_t exe::get_best_arm | ( | const std::vector< std::vector< real_t > > & | records | ) |
| std::vector< real_t > exe::get_probs | ( | uint_t | n | ) |
| real_t exe::get_reward | ( | real_t | prob, |
| uint_t | n = 10 |
||
| ) |
| void exe::print_matrix | ( | const DynMat< real_t > & | mat | ) |
| void exe::update_record | ( | std::vector< std::vector< real_t > > & | records, |
| uint_t | action, | ||
| real_t | r | ||
| ) |
| const uint_t exe::N = 10 |
| const auto exe::N_EXPERIMENTS = 500 |
| const uint exe::SEED = 42 |
| const auto exe::TAU = 0.7 |