base epsilon-greedy: choose a random action with probability \epsilon otherwise, we choose the action with the best expectation \arg\max_{a} Q(s,a) epsilon-greedy exploration with decay Sometimes, approaches are suggested to decay \epsilon whereby, at each timestamp:
\begin{equation} \epsilon \leftarrow \alpha \epsilon \end{equation}
whereby \alpha \in (0,1) is called the “decay factor.” Explore-then-commit Select actions uniformly at random for k steps; then, go to greedy and stay there