A reinforcement learning agent that optimizes the Boston MBTA rapid-transit network by adding/removing connections and adjusting service frequency to minimize average commuter travel time.
AI_MBTA_Project/
├── data/
│ ├── stops.txt # MBTA station data (from GTFS)
│ └── t_edges.txt # Edge list: station pairs, travel times, line colors
├── env/
│ ├── network.py # Builds the NetworkX graph from raw data
│ └── mbta_env.py # Gymnasium RL environment
├── agents/
│ └── dqn_agent.py # DQN agent (Q-network, replay buffer, target network)
├── training/
│ └── train_dqn.py # Training loop with hyperparameter config
├── evaluation/
│ └── evaluate_agents.py # Runs trained models, prints metrics, saves visualizations
├── outputs/
│ ├── mbta_graph.pkl # Serialized base graph (generated by network.py)
│ ├── models/ # Saved .pt model weights
│ ├── plots/ # Training curves (reward, mean travel time)
│ ├── graphs/ # Final optimized graph visualizations + pickles
│ └── logs/ # Evaluation text logs
└── requirements.txt
pip install -r requirements.txtParses data/stops.txt and data/t_edges.txt into a NetworkX graph and saves it to outputs/mbta_graph.pkl. Also displays a visualization of the base network.
python env/network.pyMake sure outputs/mbta_graph.pkl exists before proceeding.
Open training/train_dqn.py and set the hyperparameters at the top of the file:
NUM_EPISODES = 400 # number of training episodes
EPSILON_DECAY = 0.995 # exploration decay rateThen run:
python training/train_dqn.pyThis will:
- Train the agent and print per-episode metrics (reward, mean travel time, loss)
- Save the trained model to
outputs/models/<run_tag>.pt - Save training plots (reward curve, mean travel time curve) to
outputs/plots/
Open evaluation/evaluate_agents.py and make sure the run parameters at the top match the model you want to evaluate:
DQN_EPISODES = 400
DQN_LR = 0.0001
DQN_EPSILON_DECAY = 0.995
DQN_BUFFER = 5000
DQN_TARGET_UPDATE = 200These are used to construct the model filename. Then run:
python evaluation/evaluate_agents.pyThis will:
- Load the trained model and run one greedy evaluation episode
- Print final mean travel time, improvement %, total reward
- Save the optimized graph (pickle + PNG diff visualization) to
outputs/graphs/ - Save a text log to
outputs/logs/
The agent chooses from 4 action types applied to any ordered station pair:
| ID | Action | Description | Budget cost |
|---|---|---|---|
| 0 | ADD_EDGE | Connect two unconnected stations | travel_time * 5.0 |
| 1 | REMOVE_EDGE | Remove an existing non-bridge edge | refund travel_time * 2.5 |
| 2 | SPEED_UP | Decrease edge travel time by 0.5 min | 1.5 |
| 3 | SLOW_DOWN | Increase edge travel time by 0.5 min | refund 0.75 |
Invalid actions (e.g. removing a bridge, exceeding budget) are masked out. If one still slips through, a -10 reward penalty is applied.
| Index | Feature | Range |
|---|---|---|
| 0 | Normalized mean travel time | [0, 1] |
| 1 | Normalized edge density | [0, 1] |
| 2 | Improvement over baseline | [-1, 1] |
| 3 | Reachability ratio | [0, 1] |
| 4 | Normalized mean node degree | [0, 1] |
| 5 | Red line mean travel time | [0, 1] |
| 6 | Orange line mean travel time | [0, 1] |
| 7 | Blue line mean travel time | [0, 1] |
| 8 | Green line mean travel time | [0, 1] |
| 9 | Remaining budget fraction | [0, 1] |
(previous_mean_travel_time - current_mean_travel_time) * 20
Travel times are weighted by time-of-day demand (AM/PM rush hours prioritize suburb-downtown pairs). Unreachable station pairs incur a 500-minute penalty.
- 50 steps per episode
- Simulated clock advances 30 minutes per step (cycles through AM rush, midday, PM rush, evening, overnight)
- Fixed budget per episode (default 500 for env checks, 1000 for training)