A forkable platform for reproducible research. Four layers, two quality gates, manifest-based provenance. Fork the repo, run a job, publish the results. One afternoon.
The repo defines what should run. The machine executes. Outputs are validated, promoted, and published — each step is a deliberate act, never automatic.
Code, configs, workflow definitions, environment specs. Everything else is replaceable — this layer is the authoritative history.
GitHub Actions (hosted or self-hosted) pulls the repo, builds a container from the canonical Dockerfile, and executes the job. All outputs land in ephemeral storage.
Automated. Schema compliance, file existence, SHA-256 hash integrity. Passes → curated. Fails → stays ephemeral for diagnosis.
Validated runs with stable identifiers and full manifests. Retained indefinitely. Each object carries a retention class and a visibility class — they are independent axes.
Not every valid run should be public. This gate checks visibility classification, re-verifies output hashes independently, and requires explicit approval. Append-forbidden once promoted.
GitHub Pages. Zero server cost. Reads only from published data — never from live computation. The platform functions without this layer; it is an opt-in feature.
The included rocket_scan job simulates single-stage rocket trajectories across a launch angle sweep. Constant thrust, quadratic drag, Euler integration, flat Earth. Toy physics — real documentation — real provenance.
The manifest is the atomic unit of provenance. It records the exact commit, the container digest, the machine resources, every parameter, and the SHA-256 hash of every output file.
If two people run the same job from the same commit in the same container, the outputs should be bit-identical. The manifest lets you verify that.
Manifests are committed to the repo only when promoted. Ephemeral runs do not pollute the version history. Published runs are append-forbidden — corrections require a supersession record, not deletion.
{
"schema_version": "2.0.0",
"run_id": "2026-03-10T09-39Z_rocket_scan_c6d707b_049a",
"provenance": {
"git_commit": "c6d707bfd887...",
"source_mode": "local_dev",
"workspace_dirty": false
},
"environment": {
"canonical_spec": "environments/Dockerfile",
"image_digest": "sha256:9f86d08...",
"lockfile_hash": "sha256:afd966..."
},
"resources": {
"arch": "x86_64",
"cores_used": 2,
"ram_gb": 9.0
},
"outputs": [
{ "filename": "results.csv",
"sha256": "34da0b54..." },
{ "filename": "trajectories.svg",
"sha256": "e700483b..." }
],
"status": "success",
"visibility": "public"
}
No cloud accounts, no databases, no paid services. A GitHub account and Python 3.10+ are enough for Stage 0.
One click on GitHub. The full pipeline — workflows, schema, dashboard — comes with it.
python -m src.lib.runner --job rocket_scan
Executes the job, generates a manifest, validates the outputs, promotes to curated.
Open data/curated/*/manifest.json. Every parameter, every hash, every machine detail — recorded automatically.
Set visibility to public, trigger the publication gate. The dashboard updates. Your results are manifest-tracked and append-protected.
Replace the rocket scan with your own research code. The four-layer architecture, two-gate pipeline, and manifest schema work for any numerical workflow.