Getting started¶
This page takes you from a clean environment to a five-seed benchmark with a LaTeX-ready summary table and a critical-difference diagram. It assumes familiarity with PyTorch and PyG; everything else is covered as we go.
Install¶
uv add graphnetz
# or, in an existing environment:
pip install graphnetz
For local development, clone the repo and use the dev group:
git clone https://github.com/quant-sci/graphnetz
cd graphnetz
uv sync --group dev
Requires Python ≥ 3.10, PyTorch ≥ 2.6, and torch-geometric ≥ 2.6.
Optional extras: graphnetz[ogb] for OGB loaders, graphnetz[chem] to
pull in RDKit (required by OGB molecular loaders such as ogbg-molhiv).
Train one model¶
The single-task trainers accept any nn.Module and return a per-epoch
history dict ready for plotting:
from graphnetz import GCN, train_node_classification, plot_history
from graphnetz.datasets.social import cora
ds = cora("data/cora")
model = GCN(ds.num_features, 64, ds.num_classes)
history = train_node_classification(model, ds[0], epochs=200)
fig, ax = plot_history(history, title="GCN on Cora")
Use this when you only need one model on one dataset and don’t care about
cross-seed variance. For everything else — multi-seed, multi-task,
multi-model — reach for run_benchmark.
Tip
GPU is automatic. Both the standalone trainers and run_benchmark
accept device='auto' (the default). The runtime picks CUDA when
available, then Apple-silicon MPS, then CPU, and moves the model and
data onto it for you. Pin placement explicitly with device='cpu' (or
any torch.device) when you need to.
Run a multi-seed benchmark¶
from graphnetz import GAT, GCN, GraphSAGE, GraphTransformer, run_benchmark
report = run_benchmark(
"social",
{"GCN": GCN, "GAT": GAT, "GraphSAGE": GraphSAGE, "GraphTransformer": GraphTransformer},
seeds=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
task_type="node_cls",
)
print(report.summary()) # per-(task, model) mean ± t-CI
print(report.pairwise()) # Holm-corrected paired t-tests
fig, _ = report.plot_critical_difference(alpha=0.05)
report.to_latex("results.tex") # publication-ready table
The same call works for any category and task family: pass task_type="graph_cls"
to benchmark on graph classification, task_type="link_pred" for link
prediction, and so on. Pass only=[task_name, ...] to restrict to specific
loaders.
Plug in your own model¶
Custom models go through the same statistical pipeline as the built-ins — multi-seed, Holm-corrected, CD-diagrammed — once they declare which task they support. Three integration paths cover the common cases:
Decorator — permanent registration at import time. Best for libraries or shared modules:
import torch
from torch_geometric.nn import GCNConv
from graphnetz import register_model
@register_model(task_type={"node_cls"})
class MyGNN(torch.nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels):
super().__init__()
self.conv1 = GCNConv(in_channels, hidden_channels)
self.conv2 = GCNConv(hidden_channels, out_channels)
def forward(self, data):
x, ei = data.x, data.edge_index
return self.conv2(torch.relu(self.conv1(x, ei)), ei)
run_benchmark("social", {"MyGNN": MyGNN}, task_type="node_cls", seeds=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
Class attribute — same effect, no decorator dependency:
class MyGNN(torch.nn.Module):
task_types = {"node_cls"}
...
Inline tuple — one-shot variants for hyperparameter sweeps. The third
slot is a factory (in_channels, hidden_channels, out_channels) -> Module:
run_benchmark(
"social",
{
"MyGNN-d0.3": (MyGNN, "node_cls", lambda i, h, o: MyGNN(i, h, o, dropout=0.3)),
"MyGNN-d0.5": (MyGNN, "node_cls", lambda i, h, o: MyGNN(i, h, o, dropout=0.5)),
},
)
For node-level encoders that should run on all four task types without writing the adapter glue, see the multi-tasks factory.
Plug in your own dataset¶
Custom datasets get the same statistical pipeline as the built-ins.
The minimal contract is the standard PyG one — your dataset object exposes
ds[0] returning a Data, plus the relevant attributes for the task
(num_features, num_classes, or num_relations).
Quickest path — wrap an already-loaded dataset and pass it via
tasks=:
from graphnetz import GCN, run_benchmark, task_from_dataset
# Your dataset (any PyG-shaped object).
ds = my_loader("data/my_dataset")
task = task_from_dataset("my_dataset", "node_cls", ds, epochs=100)
report = run_benchmark(
models={"GCN": GCN},
tasks=[task],
seeds=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9),
)
No BENCHMARK_TASKS mutation, no global state — tasks= bypasses the
registry entirely. category defaults to "custom" for cache-path
namespacing.
Permanent registration — make your dataset visible to
run_benchmark(category, ...) and iter_benchmark_tasks:
from graphnetz import register_task, task_from_dataset, unregister_task
register_task("biology", task_from_dataset("my_assay", "graph_cls", ds, epochs=50))
# ... later, if you want to remove it:
unregister_task("biology", "my_assay")
Seed-aware loaders — for synthetic datasets where each seed should
produce a fresh sample, write a loader that takes a seed keyword. The
dispatcher detects it via inspect.signature and passes the benchmark
seed in:
from graphnetz.benchmark import Task
def my_loader(root: str, *, seed: int):
return MySyntheticDataset(root, num_graphs=100, seed=seed)
task = Task("synthetic_g100", "graph_cls", my_loader, epochs=20)
report = run_benchmark(models={"GCN": GCN}, tasks=[task], seeds=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
Tip
The conventions for each task (which attributes the dataset must
expose, how splits are encoded) live in the trainer docstrings:
graphnetz.train_node_classification(),
graphnetz.train_graph_classification(),
graphnetz.train_graph_regression(),
graphnetz.train_link_prediction(), and
graphnetz.train_relational_link_prediction().
Five-minute tour¶
Pick a category.
combinatorial,biology,social,knowledge,infrastructure,finance,computing,vision,physics,security. Installing the optionalogbextra adds OGB datasets to the existing domain categories (e.g.ogbn-arxivjoinssocial/node_cls,ogbg-molhivjoinsbiology/graph_cls).Pick a task.
node_cls,graph_cls,graph_reg, orlink_pred. The runner skips models that don’t declare support for the chosen task.Pick architectures. Any subset of the five built-ins, or your own — see Models & adapters.
Run.
run_benchmark(category, models, task_type=..., seeds=...). Useseeds=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)for the default reproducible 10-seed sweep.Report. Call
summary,pairwise,plot_critical_difference,plot_pairwise,plot_forest,plot_learning_curves,to_latex,pairwise_to_latex. Every method works on the sameBenchmarkReport.
Next steps¶
Browse the dataset taxonomy to find loaders that match your domain.
Read Reading the report to learn which plot or table answers which question.
Skim Contributing before adding a new loader, model, or task so your additions thread through the same pipeline.