Undirected Exploration

base epsilon-greedy:
choose a random action with probability \epsilon otherwise, we choose the action with the best expectation \arg\max_{a} Q(s,a) epsilon-greedy exploration with decay Sometimes, approaches are suggested to decay \epsilon whereby, at each timestamp:

\begin{equation} \epsilon \leftarrow \alpha \epsilon \end{equation}

whereby \alpha \in (0,1) is called the “decay factor.”
Explore-then-commit Select actions uniformly at random for k steps; then, go to greedy and stay there