Contributing¶
Thanks for your interest in contributing. The bar for new code is correctness, statistical honesty, and clarity — in that order. New loaders and architectures are welcome; new evaluation shortcuts are not.
Ground rules¶
No silent baselines. Every cell in
BENCHMARK_TASKSmust carry a real held-out metric (test accuracy, test AUC, validation MAE). Self-supervised losses are not benchmark metrics — usetrain_dgi/DGIWrapperas a pre-training utility instead.Statistics first. New evaluation paths must thread through the multi-seed pipeline so the report still produces CIs, Holm-corrected pairwise tests, and Friedman–Nemenyi diagrams without bespoke code.
Determinism. Seed every RNG. A run with the same seed list and software stack must reproduce bit-for-bit on the same hardware. The benchmark dispatcher already reseeds Python
random, NumPy, Torch CPU, and Torch CUDA; new code paths must not introduce ungated stochasticity.Small, focused PRs. One loader, one model, or one bug per PR. Keep unrelated reformatting out.
Quick development loop¶
git clone https://github.com/quant-sci/graphnetz
cd graphnetz
uv sync --group dev
uv run pytest # smoke tests
uv run ruff check # lint (must be clean before review)
Adding a dataset loader¶
Pick the right category module under
src/graphnetz/datasets/. Open an issue first if a new top-level category is needed — the taxonomy is intentionally small.Write a thin loader function that returns a PyG dataset. Keep it stateless and one network per call. See
social.pyandbiology.pyfor reference shapes.Register it in
LOADER_REGISTRYunder each task it can serve. A single loader may appear under multiple task types (e.g.corais bothnode_clsandlink_pred).If the loader is appropriate for the curated benchmark, add a
Task(...)entry toBENCHMARK_TASKSinbenchmark.pyand pick an epoch budget that converges on a laptop.Add a one-line entry in
tests/test_smoke.pyso the loader is exercised in CI.
Adding a model¶
The dispatcher routes by task, not by model name, so models declare which task they support up front:
from graphnetz import register_model
@register_model(task_type={"node_cls", "graph_cls"})
class MyGNN(torch.nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels):
...
The default factory calls cls(in_channels, hidden_channels, out_channels).
For non-standard signatures, pass a factory= callable. For node-level
encoders that should plug into every task, prefer wrapping with
graphnetz.benchmark._multi_task_factory() rather than maintaining
a separate implementation per task type.
Adding a task¶
Adding a new task (e.g. node_reg, temporal) is a four-step change:
Append the new task to
TASK_TYPESinbenchmark.py.Add a training routine in
training.pyreturning a per-epoch metric dict (the existing trainers are the template).Add an adapter in
models/_adapters.pyif node-level encoders should plug into the new task via the multi-task factory.Extend
_run_taskinbenchmark.pywith the dispatch branch.
Then document the new task in Dataset taxonomy → Tasks.
Adding a statistical test¶
Stay inside BenchmarkReport (benchmark.py). New tests should:
Operate on the per-seed
final_metrics()table, not on training loss.Return a structured object (DataFrame / dict) and ship a matching LaTeX exporter.
Prefer closed-form null distributions from
scipy.statsover bootstrap simulation, unless the paired-by-seed structure makes the bootstrap clearly preferable (see the percentile-bootstrap CI helper for the pattern).
Building the docs¶
uv sync --group docs
uv run sphinx-build -W --keep-going -b html docs docs/_build/html
open docs/_build/html/index.html
The -W flag treats warnings as errors; CI also runs the docs build, so
keep it warning-clean.
Code style¶
Python 3.10+; type hints on every public function.
ruffis the source of truth for lint and formatting — PRs must beruffclean.Docstrings on public symbols only: one-line summary, optional body, no multi-paragraph essays.
Comments explain why, not what — well-named identifiers cover the what.
Tests under
tests/. Smoke tests are fine for new loaders; full coverage is required for new statistical helpers.
Reporting issues¶
Please include:
A minimal reproducer (
python -c "..."is best).python --version,pip freeze | grep -E "torch|geometric|graphnetz".The full traceback, not just the last line.
For security-sensitive issues, please open a private issue or email the
maintainer listed in pyproject.toml rather than opening a public issue.