generated from jobindjohn/obsidian-publish-mkdocs
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 2.md * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 16.md * PUSH ATTACHMENT : Pasted image 20241020203656.png * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 6.md * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 5.md * PUSH ATTACHMENT : Pasted image 20241021121638.png * PUSH ATTACHMENT : Pasted image 20241021121518.png * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 4.md * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 3.md * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 13.md * PUSH ATTACHMENT : Pasted image 20241020212513.png * PUSH ATTACHMENT : Pasted image 20241020213317.png * PUSH ATTACHMENT : Pasted image 20241020213339.png * PUSH ATTACHMENT : Pasted image 20241020213450.png * PUSH ATTACHMENT : Pasted image 20241020213623.png * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 11.md
- Loading branch information
Showing
16 changed files
with
195 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
29 changes: 29 additions & 0 deletions
29
...erence notes/104 Other/Reinforcement Learning - An Introduction - Chapter 16.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
authors: | ||
- "[[Richard S. Sutton|Richard S. Sutton]]" | ||
- "[[Andrew G. Barton|Andrew G. Barton]]" | ||
year: 2018 | ||
tags: | ||
- textbook | ||
- rl1 | ||
url: | ||
share: true | ||
--- | ||
# 16 Applications and Case Studies | ||
## 16.5 Human-level Video Game Play | ||
|
||
DQN's Network architecture: Conv2d + RELU blocks for feature extraction and linear layers for output. | ||
![[Pasted image 20241020203656.png|Pasted image 20241020203656.png]] | ||
|
||
|
||
> [!NOTE] Equation 16.3: DQN Semi-Gradient update rule | ||
> | ||
> $$ | ||
> \mathbf{w}_{t+1} = \mathbf{w}_{t} + \alpha \left[ R_{t+1} + \gamma \max_{a} \hat{q}(S_{t+1}, a; \mathbf{w}_{t}) - \hat{q}(S_t, A_t; \mathbf{w}_{t}) \right] \nabla \hat{q}(S_t, A_t; \mathbf{w}_{t}) | ||
> $$ | ||
> [!FAQ]- What are the three modifications to Q-learning that make DQN? | ||
> 1. Experience Replay: Useful to use data better and remove the dependence of successive experiences on the current weights. | ||
> 2. "Double Q-learning": Keep a copy of the network at the previous step to provide targets to avoid divergence and oscillations. | ||
> 3. Clip the error term $R_{t+1} + \gamma \max_{a} q(S_{t+1}, a; \mathbf{w}_{t}) - q(S_t, A_t; \mathbf{w}_{t})$ to $[-1, 1]$ to improve stability. | ||
69 changes: 69 additions & 0 deletions
69
...ference notes/104 Other/Reinforcement Learning - An Introduction - Chapter 2.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
--- | ||
authors: | ||
- "[[Richard S. Sutton|Richard S. Sutton]]" | ||
- "[[Andrew G. Barton|Andrew G. Barton]]" | ||
year: 2018 | ||
tags: | ||
- textbook | ||
- rl1 | ||
url: | ||
share: true | ||
--- | ||
# 2 Multi-armed Bandits | ||
|
||
## 2.2 Action-value Methods | ||
|
||
|
||
> [!NOTE] Equation 2.1: Sample-average Method | ||
> | ||
> $$ | ||
> Q_t(a) \doteq \frac{\text{sum of rewards when } a \text{ taken prior to } t}{\text{number of times } a \text{ taken prior to } t} = \frac{\sum_{i=1}^{t-1} R_i \cdot \mathbb{1}_{A_i = a}}{\sum_{i=1}^{t-1} \mathbb{1}_{A_i = a}} \tag{2.1} | ||
> $$ | ||
> [!NOTE] Equation 2.2: Greedy Action Selection | ||
> | ||
> $$ | ||
> A_t \doteq \underset{a}{\arg\max} Q_t(a) \tag{2.2} | ||
> $$ | ||
## 2.4 Incremental Implementation | ||
|
||
|
||
> [!NOTE] Equation 2.4: Incremental Sample-average method | ||
> | ||
> $$ | ||
> Q_{n+1} = Q_n + \frac{1}{n}[R_n - Q_n] \tag{2.4} | ||
> $$ | ||
> | ||
> Where: | ||
> - $Q_1$ is usually initialized to zero. | ||
> - $R_n$ is the reward received after the $n$-th selection of action $a$ | ||
> - $Q_n$ denote the estimate of its action value after it has been selected $n-1$ times | ||
|
||
## 2.5 Tracking a Nonstationary Problem | ||
|
||
|
||
> [!NOTE] Equation 2.7: Two learning rate conditions to ensure convergence | ||
> | ||
> $$ | ||
> \sum_{n=1}^{\infty} \alpha_n(a) = \infty \quad \text{and} \quad \sum_{n=1}^{\infty} \alpha_n^2(a) < \infty \tag{2.7} | ||
> $$ | ||
|
||
## 2.6 Optimistic Initial Values | ||
|
||
TLDR: Initializing $Q_1(a)$ to a positive non-zero number encourages exploration. | ||
|
||
## 2.7 Upper-Confidence-Bound Action Selection | ||
|
||
|
||
> [!NOTE] Equation 2.10: UCB action selection | ||
> | ||
> $$ | ||
> A_t \doteq \underset{a}{\arg\max} \left[ Q_t(a) + c \sqrt{\frac{\ln t}{N_t(a)}} \right] \tag{2.10} | ||
> $$ | ||
> | ||
> Where: | ||
> - $c > 0$ controls the degree of exploration. | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.