Skip to content

Fix terminal Q-value in QLearningAgent/SARSA (use r, not r1) (#1247)#1337

Merged
dmeoli merged 1 commit into
masterfrom
fix-qlearning-terminal
Jun 27, 2026
Merged

Fix terminal Q-value in QLearningAgent/SARSA (use r, not r1) (#1247)#1337
dmeoli merged 1 commit into
masterfrom
fix-qlearning-terminal

Conversation

@dmeoli

@dmeoli dmeoli commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

QLearningAgent (and SARSALearningAgent) set Q[s, None] = r1 (the new percept's reward) when the previous state was terminal, instead of r (the reward received at the terminal state) as in AIMA Fig 21.8. This made terminal Q-values wrong (≈-0.04/noisy instead of +1/-1) and produced incorrect policies. Fixed both; verified terminal Q-values now converge to ≈+1/-1 and the q-learning test passes. Fixes #1247.

On entering a previously-terminal state, the agent set Q[s, None] = r1 (the
reward of the *new* percept) instead of r (the stored reward received at the
terminal state), per AIMA Fig 21.8 (Q[s, None] <- r). This made terminal
Q-values wrong (e.g. ~-0.04 / noisy instead of +1/-1) and yielded incorrect
policies. Verified: terminal Q-values now converge to ~+1 / ~-1. Fixes #1247.
@dmeoli dmeoli merged commit fff0f89 into master Jun 27, 2026
@dmeoli dmeoli deleted the fix-qlearning-terminal branch June 27, 2026 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QLearningAgent learns incorrect results

1 participant