Skip to content

DanielCoder834/AI_MBTA_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MBTA Network Optimization with Deep Reinforcement Learning

A reinforcement learning agent that optimizes the Boston MBTA rapid-transit network by adding/removing connections and adjusting service frequency to minimize average commuter travel time.

Project Structure

AI_MBTA_Project/
├── data/
│   ├── stops.txt              # MBTA station data (from GTFS)
│   └── t_edges.txt            # Edge list: station pairs, travel times, line colors
├── env/
│   ├── network.py             # Builds the NetworkX graph from raw data
│   └── mbta_env.py            # Gymnasium RL environment
├── agents/
│   └── dqn_agent.py           # DQN agent (Q-network, replay buffer, target network)
├── training/
│   └── train_dqn.py           # Training loop with hyperparameter config
├── evaluation/
│   └── evaluate_agents.py     # Runs trained models, prints metrics, saves visualizations
├── outputs/
│   ├── mbta_graph.pkl         # Serialized base graph (generated by network.py)
│   ├── models/                # Saved .pt model weights
│   ├── plots/                 # Training curves (reward, mean travel time)
│   ├── graphs/                # Final optimized graph visualizations + pickles
│   └── logs/                  # Evaluation text logs
└── requirements.txt

Setup

pip install -r requirements.txt

How to Run

1. Build the MBTA graph

Parses data/stops.txt and data/t_edges.txt into a NetworkX graph and saves it to outputs/mbta_graph.pkl. Also displays a visualization of the base network.

python env/network.py

Make sure outputs/mbta_graph.pkl exists before proceeding.

2. Train the DQN agent

Open training/train_dqn.py and set the hyperparameters at the top of the file:

NUM_EPISODES   = 400     # number of training episodes
EPSILON_DECAY  = 0.995   # exploration decay rate

Then run:

python training/train_dqn.py

This will:

  • Train the agent and print per-episode metrics (reward, mean travel time, loss)
  • Save the trained model to outputs/models/<run_tag>.pt
  • Save training plots (reward curve, mean travel time curve) to outputs/plots/

3. Evaluate

Open evaluation/evaluate_agents.py and make sure the run parameters at the top match the model you want to evaluate:

DQN_EPISODES       = 400
DQN_LR             = 0.0001
DQN_EPSILON_DECAY  = 0.995
DQN_BUFFER         = 5000
DQN_TARGET_UPDATE  = 200

These are used to construct the model filename. Then run:

python evaluation/evaluate_agents.py

This will:

  • Load the trained model and run one greedy evaluation episode
  • Print final mean travel time, improvement %, total reward
  • Save the optimized graph (pickle + PNG diff visualization) to outputs/graphs/
  • Save a text log to outputs/logs/

Environment Details

Actions

The agent chooses from 4 action types applied to any ordered station pair:

ID Action Description Budget cost
0 ADD_EDGE Connect two unconnected stations travel_time * 5.0
1 REMOVE_EDGE Remove an existing non-bridge edge refund travel_time * 2.5
2 SPEED_UP Decrease edge travel time by 0.5 min 1.5
3 SLOW_DOWN Increase edge travel time by 0.5 min refund 0.75

Invalid actions (e.g. removing a bridge, exceeding budget) are masked out. If one still slips through, a -10 reward penalty is applied.

Observation (10 floats)

Index Feature Range
0 Normalized mean travel time [0, 1]
1 Normalized edge density [0, 1]
2 Improvement over baseline [-1, 1]
3 Reachability ratio [0, 1]
4 Normalized mean node degree [0, 1]
5 Red line mean travel time [0, 1]
6 Orange line mean travel time [0, 1]
7 Blue line mean travel time [0, 1]
8 Green line mean travel time [0, 1]
9 Remaining budget fraction [0, 1]

Reward

(previous_mean_travel_time - current_mean_travel_time) * 20

Travel times are weighted by time-of-day demand (AM/PM rush hours prioritize suburb-downtown pairs). Unreachable station pairs incur a 500-minute penalty.

Episode

  • 50 steps per episode
  • Simulated clock advances 30 minutes per step (cycles through AM rush, midday, PM rush, evening, overnight)
  • Fixed budget per episode (default 500 for env checks, 1000 for training)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages