MBTA Network Optimization with Deep Reinforcement Learning

A reinforcement learning agent that optimizes the Boston MBTA rapid-transit network by adding/removing connections and adjusting service frequency to minimize average commuter travel time.

Project Structure

AI_MBTA_Project/
├── data/
│   ├── stops.txt              # MBTA station data (from GTFS)
│   └── t_edges.txt            # Edge list: station pairs, travel times, line colors
├── env/
│   ├── network.py             # Builds the NetworkX graph from raw data
│   └── mbta_env.py            # Gymnasium RL environment
├── agents/
│   └── dqn_agent.py           # DQN agent (Q-network, replay buffer, target network)
├── training/
│   └── train_dqn.py           # Training loop with hyperparameter config
├── evaluation/
│   └── evaluate_agents.py     # Runs trained models, prints metrics, saves visualizations
├── outputs/
│   ├── mbta_graph.pkl         # Serialized base graph (generated by network.py)
│   ├── models/                # Saved .pt model weights
│   ├── plots/                 # Training curves (reward, mean travel time)
│   ├── graphs/                # Final optimized graph visualizations + pickles
│   └── logs/                  # Evaluation text logs
└── requirements.txt

Setup

pip install -r requirements.txt

How to Run

1. Build the MBTA graph

Parses data/stops.txt and data/t_edges.txt into a NetworkX graph and saves it to outputs/mbta_graph.pkl. Also displays a visualization of the base network.

python env/network.py

Make sure outputs/mbta_graph.pkl exists before proceeding.

2. Train the DQN agent

Open training/train_dqn.py and set the hyperparameters at the top of the file:

NUM_EPISODES   = 400     # number of training episodes
EPSILON_DECAY  = 0.995   # exploration decay rate

Then run:

python training/train_dqn.py

This will:

Train the agent and print per-episode metrics (reward, mean travel time, loss)
Save the trained model to outputs/models/<run_tag>.pt
Save training plots (reward curve, mean travel time curve) to outputs/plots/

3. Evaluate

Open evaluation/evaluate_agents.py and make sure the run parameters at the top match the model you want to evaluate:

DQN_EPISODES       = 400
DQN_LR             = 0.0001
DQN_EPSILON_DECAY  = 0.995
DQN_BUFFER         = 5000
DQN_TARGET_UPDATE  = 200

These are used to construct the model filename. Then run:

python evaluation/evaluate_agents.py

This will:

Load the trained model and run one greedy evaluation episode
Print final mean travel time, improvement %, total reward
Save the optimized graph (pickle + PNG diff visualization) to outputs/graphs/
Save a text log to outputs/logs/

Environment Details

Actions

The agent chooses from 4 action types applied to any ordered station pair:

ID	Action	Description	Budget cost
0	ADD_EDGE	Connect two unconnected stations	`travel_time * 5.0`
1	REMOVE_EDGE	Remove an existing non-bridge edge	refund `travel_time * 2.5`
2	SPEED_UP	Decrease edge travel time by 0.5 min	`1.5`
3	SLOW_DOWN	Increase edge travel time by 0.5 min	refund `0.75`

Invalid actions (e.g. removing a bridge, exceeding budget) are masked out. If one still slips through, a -10 reward penalty is applied.

Observation (10 floats)

Index	Feature	Range
0	Normalized mean travel time	[0, 1]
1	Normalized edge density	[0, 1]
2	Improvement over baseline	[-1, 1]
3	Reachability ratio	[0, 1]
4	Normalized mean node degree	[0, 1]
5	Red line mean travel time	[0, 1]
6	Orange line mean travel time	[0, 1]
7	Blue line mean travel time	[0, 1]
8	Green line mean travel time	[0, 1]
9	Remaining budget fraction	[0, 1]

Reward

(previous_mean_travel_time - current_mean_travel_time) * 20

Travel times are weighted by time-of-day demand (AM/PM rush hours prioritize suburb-downtown pairs). Unreachable station pairs incur a 500-minute penalty.

Episode

50 steps per episode
Simulated clock advances 30 minutes per step (cycles through AM rush, midday, PM rush, evening, overnight)
Fixed budget per episode (default 500 for env checks, 1000 for training)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MBTA Network Optimization with Deep Reinforcement Learning

Project Structure

Setup

How to Run

1. Build the MBTA graph

2. Train the DQN agent

3. Evaluate

Environment Details

Actions

Observation (10 floats)

Reward

Episode

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
agents		agents
data		data
env		env
evaluation		evaluation
outputs		outputs
training		training
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MBTA Network Optimization with Deep Reinforcement Learning

Project Structure

Setup

How to Run

1. Build the MBTA graph

2. Train the DQN agent

3. Evaluate

Environment Details

Actions

Observation (10 floats)

Reward

Episode

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages