GraphNetz

GitHub Build Docs Python License Paper

A GNN benchmark whose default output is a statistical report, not a leaderboard.

Whether you are proposing a new GNN architecture, testing a model on a new graph domain, or comparing existing methods across graph types, GraphNetz turns the usual “train, evaluate, table of accuracies” workflow into a reproducible statistical report. Instead of reporting point estimates alone, it provides confidence intervals for each result, paired model comparisons with multiple-testing correction, and rank-based summaries across datasets using critical-difference diagrams. The goal is not just to crown a leaderboard winner, but to give researchers a principled way to quantify uncertainty, compare methods fairly, and produce the exact evidence reviewers often ask for in graph-learning papers.

Demšar critical-difference diagram comparing four GNN architectures by mean rank.
Demšar critical-difference diagram comparing four GNN architectures by mean rank.

A Demšar critical-difference diagram . Models are ordered by mean Friedman rank; the horizontal bar connects groups whose ranks are not significantly different at the chosen \(\alpha\) under the Nemenyi post-hoc.

Install

pip install graphnetz
# or
uv add graphnetz

Requires Python ≥ 3.10, PyTorch ≥ 2.6, torch-geometric ≥ 2.6.

Quick Start

from graphnetz import GAT, GCN, GraphSAGE, run_benchmark

report = run_benchmark(
    "social",
    {"GCN": GCN, "GAT": GAT, "GraphSAGE": GraphSAGE},
    seeds=range(10),
    task_type="node_cls",
)

print(report.summary())          # per-(task, model) mean ± t-CI
print(report.pairwise())         # Holm-corrected paired t-tests
report.plot_critical_difference(alpha=0.05)
report.to_latex("results.tex")   # publication-ready table

→ Walk through this end-to-end in Getting started.

Why GraphNetz

Honest comparisons by default

Per-cell Student’s-t (or percentile-bootstrap) CIs, Holm-adjusted paired t-tests within each task, Friedman ranks plus Nemenyi CD across tasks — no extra bookkeeping.

One call, every metric

run_benchmark(category, models, seeds=...) trains every compatible (task, model, seed) triple and returns a BenchmarkReport.

Publication-ready artefacts

report.to_latex(...), plot_forest(), plot_pairwise(), plot_critical_difference()

Pluggable models

Decorator, class attribute, or inline tuple — your encoder runs through the same statistical pipeline as the built-ins.

At a glance

Tasks

node_cls · graph_cls · graph_reg · link_pred

Architectures

GCN · GAT · GIN · GraphSAGE · GraphTransformer (DGI as a pre-training utility)

Loaders

63 across 10 categories (combinatorial, biology, social, knowledge, infrastructure, finance, computing, vision, physics, security)

Default report

per-cell mean ± Student’s-t CI · Holm-adjusted paired t · Demšar/Nemenyi CD

Source

github.com/quant-sci/graphnetz

Documentation

Start here

  • Getting started — install and run your first benchmark in five minutes.

Concepts

Reference