Contributing¶

Thanks for your interest in contributing. The bar for new code is correctness, statistical honesty, and clarity — in that order. New loaders and architectures are welcome; new evaluation shortcuts are not.

Ground rules¶

No silent baselines. Every cell in BENCHMARK_TASKS must carry a real held-out metric (test accuracy, test AUC, validation MAE). Self-supervised losses are not benchmark metrics — use train_dgi / DGIWrapper as a pre-training utility instead.
Statistics first. New evaluation paths must thread through the multi-seed pipeline so the report still produces CIs, Holm-corrected pairwise tests, and Friedman–Nemenyi diagrams without bespoke code.
Determinism. Seed every RNG. A run with the same seed list and software stack must reproduce bit-for-bit on the same hardware. The benchmark dispatcher already reseeds Python random, NumPy, Torch CPU, and Torch CUDA; new code paths must not introduce ungated stochasticity.
Small, focused PRs. One loader, one model, or one bug per PR. Keep unrelated reformatting out.

Quick development loop¶

git clone https://github.com/quant-sci/graphnetz
cd graphnetz
uv sync --group dev
uv run pytest          # smoke tests
uv run ruff check      # lint (must be clean before review)

Adding a dataset loader¶

Pick the right category module under src/graphnetz/datasets/. Open an issue first if a new top-level category is needed — the taxonomy is intentionally small.
Write a thin loader function that returns a PyG dataset. Keep it stateless and one network per call. See social.py and biology.py for reference shapes.
Register it in LOADER_REGISTRY under each task it can serve. A single loader may appear under multiple task types (e.g. cora is both node_cls and link_pred).
If the loader is appropriate for the curated benchmark, add a Task(...) entry to BENCHMARK_TASKS in benchmark.py and pick an epoch budget that converges on a laptop.
Add a one-line entry in tests/test_smoke.py so the loader is exercised in CI.

Adding a model¶

The dispatcher routes by task, not by model name, so models declare which task they support up front:

from graphnetz import register_model

@register_model(task_type={"node_cls", "graph_cls"})
class MyGNN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        ...

The default factory calls cls(in_channels, hidden_channels, out_channels). For non-standard signatures, pass a factory= callable. For node-level encoders that should plug into every task, prefer wrapping with graphnetz.benchmark._multi_task_factory() rather than maintaining a separate implementation per task type.

Adding a task¶

Adding a new task (e.g. node_reg, temporal) is a four-step change:

Append the new task to TASK_TYPES in benchmark.py.
Add a training routine in training.py returning a per-epoch metric dict (the existing trainers are the template).
Add an adapter in models/_adapters.py if node-level encoders should plug into the new task via the multi-task factory.
Extend _run_task in benchmark.py with the dispatch branch.

Then document the new task in Dataset taxonomy → Tasks.

Adding a statistical test¶

Stay inside BenchmarkReport (benchmark.py). New tests should:

Operate on the per-seed final_metrics() table, not on training loss.
Return a structured object (DataFrame / dict) and ship a matching LaTeX exporter.
Prefer closed-form null distributions from scipy.stats over bootstrap simulation, unless the paired-by-seed structure makes the bootstrap clearly preferable (see the percentile-bootstrap CI helper for the pattern).

Building the docs¶

uv sync --group docs
uv run sphinx-build -W --keep-going -b html docs docs/_build/html
open docs/_build/html/index.html

The -W flag treats warnings as errors; CI also runs the docs build, so keep it warning-clean.

Code style¶

Python 3.10+; type hints on every public function.
ruff is the source of truth for lint and formatting — PRs must be ruff clean.
Docstrings on public symbols only: one-line summary, optional body, no multi-paragraph essays.
Comments explain why, not what — well-named identifiers cover the what.
Tests under tests/. Smoke tests are fine for new loaders; full coverage is required for new statistical helpers.

Reporting issues¶

Please include:

A minimal reproducer (python -c "..." is best).
python --version, pip freeze | grep -E "torch|geometric|graphnetz".
The full traceback, not just the last line.

For security-sensitive issues, please open a private issue or email the maintainer listed in pyproject.toml rather than opening a public issue.