GraphNetz¶
A GNN benchmark whose default output is a statistical report, not a leaderboard.
Whether you are proposing a new GNN architecture, testing a model on a new graph domain, or comparing existing methods across graph types, GraphNetz turns the usual “train, evaluate, table of accuracies” workflow into a reproducible statistical report. Instead of reporting point estimates alone, it provides confidence intervals for each result, paired model comparisons with multiple-testing correction, and rank-based summaries across datasets using critical-difference diagrams. The goal is not just to crown a leaderboard winner, but to give researchers a principled way to quantify uncertainty, compare methods fairly, and produce the exact evidence reviewers often ask for in graph-learning papers.
A Demšar critical-difference diagram . Models are ordered by mean Friedman rank; the horizontal bar connects groups whose ranks are not significantly different at the chosen \(\alpha\) under the Nemenyi post-hoc.¶
Install¶
pip install graphnetz
# or
uv add graphnetz
Requires Python ≥ 3.10, PyTorch ≥ 2.6, torch-geometric ≥ 2.6.
Quick Start¶
from graphnetz import GAT, GCN, GraphSAGE, run_benchmark
report = run_benchmark(
"social",
{"GCN": GCN, "GAT": GAT, "GraphSAGE": GraphSAGE},
seeds=range(10),
task_type="node_cls",
)
print(report.summary()) # per-(task, model) mean ± t-CI
print(report.pairwise()) # Holm-corrected paired t-tests
report.plot_critical_difference(alpha=0.05)
report.to_latex("results.tex") # publication-ready table
→ Walk through this end-to-end in Getting started.
Why GraphNetz¶
Per-cell Student’s-t (or percentile-bootstrap) CIs, Holm-adjusted paired t-tests within each task, Friedman ranks plus Nemenyi CD across tasks — no extra bookkeeping.
run_benchmark(category, models, seeds=...) trains every compatible
(task, model, seed) triple and returns a
BenchmarkReport.
report.to_latex(...), plot_forest(), plot_pairwise(),
plot_critical_difference()
Decorator, class attribute, or inline tuple — your encoder runs through the same statistical pipeline as the built-ins.
At a glance¶
Tasks |
|
Architectures |
GCN · GAT · GIN · GraphSAGE · GraphTransformer (DGI as a pre-training utility) |
Loaders |
63 across 10 categories (combinatorial, biology, social, knowledge, infrastructure, finance, computing, vision, physics, security) |
Default report |
per-cell mean ± Student’s-t CI · Holm-adjusted paired t · Demšar/Nemenyi CD |
Source |
Documentation¶
Start here
Getting started — install and run your first benchmark in five minutes.
Concepts
Dataset taxonomy — the full category × task grid and how to pick a loader.
Models & adapters — built-in encoders and three ways to plug in your own.
Benchmark protocol — the five-stage pipeline that turns raw histories into a publishable report.
Reading the report — which view to use for which question.
Reference
API reference — modules, classes, and functions.
Contributing — add a loader, a model, or a new task.