GraphNetz¶

A GNN benchmark whose default output is a statistical report, not a leaderboard.

Whether you are proposing a new GNN architecture, testing a model on a new graph domain, or comparing existing methods across graph types, GraphNetz turns the usual “train, evaluate, table of accuracies” workflow into a reproducible statistical report. Instead of reporting point estimates alone, it provides confidence intervals for each result, paired model comparisons with multiple-testing correction, and rank-based summaries across datasets using critical-difference diagrams. The goal is not just to crown a leaderboard winner, but to give researchers a principled way to quantify uncertainty, compare methods fairly, and produce the exact evidence reviewers often ask for in graph-learning papers.

Demšar critical-difference diagram comparing four GNN architectures by mean rank.

Install¶

pip install graphnetz
# or
uv add graphnetz

Requires Python ≥ 3.10, PyTorch ≥ 2.6, torch-geometric ≥ 2.6.

Quick Start¶

from graphnetz import GAT, GCN, GraphSAGE, run_benchmark

report = run_benchmark(
    "social",
    {"GCN": GCN, "GAT": GAT, "GraphSAGE": GraphSAGE},
    seeds=range(10),
    task_type="node_cls",
)

print(report.summary())          # per-(task, model) mean ± t-CI
print(report.pairwise())         # Holm-corrected paired t-tests
report.plot_critical_difference(alpha=0.05)
report.to_latex("results.tex")   # publication-ready table

→ Walk through this end-to-end in Getting started.

Why GraphNetz¶

Honest comparisons by default

Per-cell Student’s-t (or percentile-bootstrap) CIs, Holm-adjusted paired t-tests within each task, Friedman ranks plus Nemenyi CD across tasks — no extra bookkeeping.

One call, every metric

run_benchmark(category, models, seeds=...) trains every compatible (task, model, seed) triple and returns a BenchmarkReport.

Publication-ready artefacts

report.to_latex(...), plot_forest(), plot_pairwise(), plot_critical_difference()

Pluggable models

Decorator, class attribute, or inline tuple — your encoder runs through the same statistical pipeline as the built-ins.

At a glance¶


Tasks	`node_cls` · `graph_cls` · `graph_reg` · `link_pred`
Architectures	GCN · GAT · GIN · GraphSAGE · GraphTransformer (DGI as a pre-training utility)
Loaders	63 across 10 categories (combinatorial, biology, social, knowledge, infrastructure, finance, computing, vision, physics, security)
Default report	per-cell mean ± Student’s-t CI · Holm-adjusted paired t · Demšar/Nemenyi CD
Source	github.com/quant-sci/graphnetz

Documentation¶

Start here

Getting started — install and run your first benchmark in five minutes.

Concepts

Dataset taxonomy — the full category × task grid and how to pick a loader.
Models & adapters — built-in encoders and three ways to plug in your own.
Benchmark protocol — the five-stage pipeline that turns raw histories into a publishable report.
Reading the report — which view to use for which question.

Reference

API reference — modules, classes, and functions.
Contributing — add a loader, a model, or a new task.