State Action Value Function Example¶

In this Jupyter notebook, you can modify the mars rover example to see how the values of Q(s,a) will change depending on the rewards and discount factor changing.

In [ ]:

Copied!

import numpy as np
from utils import *
import numpy as np
from utils import *

In [ ]:

Copied!

# Do not modify
num_states = 6
num_actions = 2
# Do not modify
num_states = 6
num_actions = 2

In [ ]:

Copied!





terminal_left_reward = 100
terminal_right_reward = 40
each_step_reward = 0

# Discount factor
gamma = 0.5

# Probability of going in the wrong direction
misstep_prob = 0
terminal_left_reward = 100
terminal_right_reward = 40
each_step_reward = 0

# Discount factor
gamma = 0.5

# Probability of going in the wrong direction
misstep_prob = 0

In [ ]:

Copied!

generate_visualization(terminal_left_reward, terminal_right_reward, each_step_reward, gamma, misstep_prob)
generate_visualization(terminal_left_reward, terminal_right_reward, each_step_reward, gamma, misstep_prob)

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search