Fix terminal Q-value in QLearningAgent/SARSA (use r, not r1) (#1247) by dmeoli · Pull Request #1337 · aimacode/aima-python

dmeoli · 2026-06-27T00:12:59Z

QLearningAgent (and SARSALearningAgent) set Q[s, None] = r1 (the new percept's reward) when the previous state was terminal, instead of r (the reward received at the terminal state) as in AIMA Fig 21.8. This made terminal Q-values wrong (≈-0.04/noisy instead of +1/-1) and produced incorrect policies. Fixed both; verified terminal Q-values now converge to ≈+1/-1 and the q-learning test passes. Fixes #1247.

On entering a previously-terminal state, the agent set Q[s, None] = r1 (the reward of the *new* percept) instead of r (the stored reward received at the terminal state), per AIMA Fig 21.8 (Q[s, None] <- r). This made terminal Q-values wrong (e.g. ~-0.04 / noisy instead of +1/-1) and yielded incorrect policies. Verified: terminal Q-values now converge to ~+1 / ~-1. Fixes #1247.

dmeoli merged commit fff0f89 into master Jun 27, 2026

dmeoli deleted the fix-qlearning-terminal branch June 27, 2026 00:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix terminal Q-value in QLearningAgent/SARSA (use r, not r1) (#1247)#1337

Fix terminal Q-value in QLearningAgent/SARSA (use r, not r1) (#1247)#1337
dmeoli merged 1 commit into
masterfrom
fix-qlearning-terminal

dmeoli commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dmeoli commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant