Skip to content

rezafuru/Graph-Compiler-Benchmarking

Repository files navigation

Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems

Reference implementation for the experiment framework in the paper Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems.

Benchmarks neural graph compilers under a uniform interface. It measures latency, throughput, and CPU/RAM/GPU utilization for off-the-shelf architectures and for synthetic blocks swept over width and depth, then derives the batch-scaling metrics used in the paper. Compilation runs locally on the target device; no hardware simulators are used.

Massive cudos to my Student Carmen Walser for implementing the initial core for the compiler configuration and the testbed.

If you face any issues, just send a mailt to a.furutanpey@coovally.ai, and I'll gladly try to carve out the time to help you out.

Setup

./setup.sh

Creates .venv and installs the base backends (identity, TorchScript, ONNX Runtime) and the analysis. Requires Python 3.10+. Vendor backends are installed per device:

.venv/bin/pip install '.[openvino]'   # CPU servers
.venv/bin/pip install '.[cuda]'       # CUDA devices (TensorRT)

Apache TVM (0.18.dev) is built separately, see https://tvm.apache.org/docs/install.

Run

./run.sh <experiment> <compiler> [device]

device is a label used to organize output. Results are written to results/<device>/<compiler>/<run>.csv, one tidy row per run; existing files are skipped so an interrupted sweep resumes. Pass --quick for a fast smoke run (batch sizes 1 and 8).

./run.sh architectures tensorrt gpu
./run.sh conv_blocks openvino xeon
./run.sh mha_blocks identity orin --quick
experiment models
architectures ResNet / EfficientNet / DeiT / Swin / ConvNeXt, 3 sizes each (Table IV)
conv_blocks stacked Conv-BatchNorm-ReLU, swept over width and depth (Table V)
mha_blocks stacked self-attention + ReLU, swept over width and depth (Table V)
conv2d a single convolution
fully_connected a single linear layer
self_attention a single multi-head attention layer
compiler role
identity PyTorch dynamic graph (baseline)
torchscript software-level optimization
onnxruntime software-level optimization
openvino vendor-specific (Intel CPU)
tensorrt vendor-specific (NVIDIA GPU)
tvm vendor-agnostic, with AutoTVM tuning

The experiment grid (widths, depths, batch sizes) lives in configs/experiments.yaml; the defaults reproduce the configurations reported in the paper. Per-compiler settings (device, precision, threads, optimization level) live in configs/<compiler>.yaml.

Reproduction & Traces

We provide teh experiment traces in resources/traces/results_raw.zip.

The paper uses three devices (Table I). Run on each, with the compilers that device supports:

./sweep.sh gpu  identity torchscript onnxruntime tensorrt tvm   # RTX 4070 server
./sweep.sh xeon identity torchscript onnxruntime openvino tvm   # Xeon CPU server
./sweep.sh orin identity torchscript tensorrt                   # Jetson Orin Nano

Each experiment is repeated 100 times after 10 warmup iterations. Library versions used in the paper (Table II): PyTorch 2.4.1, ONNX Runtime 1.19.2, TensorRT 10.4.0, OpenVINO 2024.3.0, Apache TVM 0.18.dev, timm 1.0.15, CUDA 12.5, cuDNN 9.3.0.

Layout

ngraphbench/
  compilers/      one backend per compiler behind a common interface
  models.py       architectures (timm) and synthetic Conv/MHA blocks
  experiments.py  expands the grid into individual runs
  measure.py      timed inference loop and system-metric monitoring
  run.py          CLI: run one experiment family with one compiler
  results.py      tidy result schema
analysis/         load runs and traces, compute derived metrics (no plotting)
configs/          experiment grid and per-compiler settings
resources/traces/ released measurement traces

Comments in the source point to the corresponding sections and tables of the paper.

Citation

[[TPDS]]

[Preprint]

@article{furutanpey2025leveraging,
  title={Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems},
  author={Furutanpey, Alireza and Walser, Carmen and Raith, Philipp and Frangoudis, Pantelis A and Dustdar, Schahram},
  journal={arXiv preprint arXiv:2504.20198},
  year={2025}
}

About

[IEEE TPDS] Repository for reproducing the experiments reported in Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems

Resources

License

Stars

Watchers

Forks

Contributors