Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems

Reference implementation for the experiment framework in the paper Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems.

Benchmarks neural graph compilers under a uniform interface. It measures latency, throughput, and CPU/RAM/GPU utilization for off-the-shelf architectures and for synthetic blocks swept over width and depth, then derives the batch-scaling metrics used in the paper. Compilation runs locally on the target device; no hardware simulators are used.

Massive cudos to my Student Carmen Walser for implementing the initial core for the compiler configuration and the testbed.

If you face any issues, just send a mailt to a.furutanpey@coovally.ai, and I'll gladly try to carve out the time to help you out.

Setup

./setup.sh

Creates .venv and installs the base backends (identity, TorchScript, ONNX Runtime) and the analysis. Requires Python 3.10+. Vendor backends are installed per device:

.venv/bin/pip install '.[openvino]'   # CPU servers
.venv/bin/pip install '.[cuda]'       # CUDA devices (TensorRT)

Apache TVM (0.18.dev) is built separately, see https://tvm.apache.org/docs/install.

Run

./run.sh <experiment> <compiler> [device]

device is a label used to organize output. Results are written to results/<device>/<compiler>/<run>.csv, one tidy row per run; existing files are skipped so an interrupted sweep resumes. Pass --quick for a fast smoke run (batch sizes 1 and 8).

./run.sh architectures tensorrt gpu
./run.sh conv_blocks openvino xeon
./run.sh mha_blocks identity orin --quick

experiment	models
`architectures`	ResNet / EfficientNet / DeiT / Swin / ConvNeXt, 3 sizes each (Table IV)
`conv_blocks`	stacked Conv-BatchNorm-ReLU, swept over width and depth (Table V)
`mha_blocks`	stacked self-attention + ReLU, swept over width and depth (Table V)
`conv2d`	a single convolution
`fully_connected`	a single linear layer
`self_attention`	a single multi-head attention layer

compiler	role
`identity`	PyTorch dynamic graph (baseline)
`torchscript`	software-level optimization
`onnxruntime`	software-level optimization
`openvino`	vendor-specific (Intel CPU)
`tensorrt`	vendor-specific (NVIDIA GPU)
`tvm`	vendor-agnostic, with AutoTVM tuning

The experiment grid (widths, depths, batch sizes) lives in configs/experiments.yaml; the defaults reproduce the configurations reported in the paper. Per-compiler settings (device, precision, threads, optimization level) live in configs/<compiler>.yaml.

Reproduction & Traces

We provide teh experiment traces in resources/traces/results_raw.zip.

The paper uses three devices (Table I). Run on each, with the compilers that device supports:

./sweep.sh gpu  identity torchscript onnxruntime tensorrt tvm   # RTX 4070 server
./sweep.sh xeon identity torchscript onnxruntime openvino tvm   # Xeon CPU server
./sweep.sh orin identity torchscript tensorrt                   # Jetson Orin Nano

Each experiment is repeated 100 times after 10 warmup iterations. Library versions used in the paper (Table II): PyTorch 2.4.1, ONNX Runtime 1.19.2, TensorRT 10.4.0, OpenVINO 2024.3.0, Apache TVM 0.18.dev, timm 1.0.15, CUDA 12.5, cuDNN 9.3.0.

Layout

ngraphbench/
  compilers/      one backend per compiler behind a common interface
  models.py       architectures (timm) and synthetic Conv/MHA blocks
  experiments.py  expands the grid into individual runs
  measure.py      timed inference loop and system-metric monitoring
  run.py          CLI: run one experiment family with one compiler
  results.py      tidy result schema
analysis/         load runs and traces, compute derived metrics (no plotting)
configs/          experiment grid and per-compiler settings
resources/traces/ released measurement traces

Comments in the source point to the corresponding sections and tables of the paper.

Citation

[[TPDS]]

[Preprint]

@article{furutanpey2025leveraging,
  title={Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems},
  author={Furutanpey, Alireza and Walser, Carmen and Raith, Philipp and Frangoudis, Pantelis A and Dustdar, Schahram},
  journal={arXiv preprint arXiv:2504.20198},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
configs		configs
graph_bench		graph_bench
resources/traces		resources/traces
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.sh		run.sh
setup.sh		setup.sh
sweep.sh		sweep.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems

Setup

Run

Reproduction & Traces

Layout

Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems

Setup

Run

Reproduction & Traces

Layout

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages