COE292 — Homework 1: Intelligent Explorer (Starter)
Complete the TODO sections only. Do not rename functions or change their arguments.
You will implement the core Q-learning logic while the scaffolding (seeding, BFS reachability, environment helpers, printing utilities) is provided for you.
Parts
- Part 1 (Training): Learn for 1000 episodes from zero knowledge; print knowledge matrices and the final greedy path + energy.
- Part 2 (Step-by-Step): Apply Q-updates along a fixed path with a given initial knowledge vector; print per-step updates and the final matrix.
Rules
- Only the Python standard library and
numpy
are allowed. - Keep output formats exact (spacing/labels) because the autograder parses them.
- Set your
STUDENT_ID
in the first code cell.