Delay-Constrained Anomaly-Aware Consensus in Heterogeneous Clock Networks
View the Project on GitHub threehouse-plus-ec/admec-clock-consensus
Date: 2026-03-31 Work package: WP1 (IC Coastline) Decision gate: DG-1 (σ-sensitivity criterion)
Test whether AIPP is robust under ±20% misspecification of declared uncertainties. Three perturbation conditions: random i.i.d., coherent overestimate (+20%), coherent underestimate (−20%). The pass criterion was defined before running the test.
AIPP shift < 15% relative under all three perturbation conditions at N ≥ 50. That is, AIPP must stay within approximately [1.06, 1.44] bit (relative to the 1.25-bit baseline). Additionally, the 95th-percentile threshold shift must be < 20% relative (anomaly-detection operating point stability).
For each point k, the declared σ_k is replaced by σ_k × (1 + ε_k):
Implemented as perturb_sigmas() in src/ic.py.
| Condition | N = 50 | N = 200 | Shift (N=200) |
|---|---|---|---|
| Unperturbed | 1.234 ± 0.092 | 1.240 ± 0.049 | — |
| Random ±20% | 1.240 ± 0.096 | 1.255 ± 0.053 | +1.2% |
| Systematic +20% | 1.079 ± 0.081 | 1.083 ± 0.039 | −12.7% |
| Systematic −20% | 1.459 ± 0.099 | 1.479 ± 0.052 | +19.3% |
| Condition | N = 50 | N = 200 |
|---|---|---|
| Unperturbed | 1.392 | 1.320 |
| Random ±20% | 1.394 | 1.337 |
| Systematic +20% | 1.212 | 1.142 |
| Systematic −20% | 1.639 | 1.559 |
| Condition | AIPP shift | < 15%? | P95 shift | < 20%? |
|---|---|---|---|---|
| Random ±20% | +1.2% | ✅ PASS | +1.3% | ✅ PASS |
| Systematic +20% | −12.7% | ✅ PASS | −13.5% | ✅ PASS |
| Systematic −20% | +19.3% | ❌ FAIL | +18.1% | ✅ PASS |
DG-1 σ-sensitivity criterion against pre-registered 15% bound: FAIL. Random and systematic overestimate pass; systematic underestimate (+19.3%) exceeds the bound. See Harbourmaster ruling below.

The figure shows AIPP distributions under all four conditions at N = 200. The grey band marks the ±15% zone around the baseline mean. Systematic −20% (red) sits clearly above the band.
17 tests in tests/test_sensitivity.py:
perturb_sigmas (range, distribution, modes, edge cases)15 pass, 2 fail (the systematic −20% AIPP shift tests at N = 50 and N = 200). The failures are genuine findings, not bugs.
The asymmetry is expected. Underestimating σ narrows the integration interval [x_k − σ_k, x_k + σ_k], capturing less probability mass from the mixture background. The AIPP rises because smaller intervals are less probable. Overestimating σ has the opposite effect but with diminishing returns — wider intervals asymptotically capture all the mass, so the sensitivity saturates.
The random ±20% condition is nearly invisible (+1.2% shift) because the positive and negative perturbations cancel in expectation when computing the mixture average. The IC is robust against random calibration noise — it’s only coherent bias that causes trouble.
The pre-registered bound was 15% relative AIPP shift. Systematic −20% underestimation produces a 19.3% shift. This is a genuine failure, not a borderline case.
| Criterion | Status |
|---|---|
| AIPP converges to theoretical limit (±5%) at N ≥ 100 | ✅ PASS |
| 95th-percentile thresholds stable within ×1.5 across noise models | ✅ PASS |
| σ-sensitivity: random ±20% | ✅ PASS (+1.2%) |
| σ-sensitivity: systematic overestimate +20% | ✅ PASS (−12.7%) |
| σ-sensitivity: systematic underestimate −20% | ❌ FAIL (+19.3%) |
| Finite-N bias quantified | ⬜ Not yet tested |
| Power-law and 1/f nulls | ⬜ Not yet tested |
IC is robust against random calibration noise. In real clock networks, uncertainty budgets have random errors from characterisation noise — the IC handles this without issue.
Coherent underestimation is the dangerous case. If all clocks systematically understate their uncertainty (e.g., due to an unmodelled noise floor), AIPP inflates by ~19%. This could cause false anomaly detections if the threshold is calibrated under correct-σ assumptions.
Decision: The project proceeds. The mitigation is operational, not a criterion revision. It is a procedural workaround — IC remains intrinsically sensitive to the fidelity of declared uncertainties.
Rationale: The 15% pre-registered bound stands — it is not relaxed. The systematic −20% condition genuinely fails it. However, the failure does not trigger the DG-1 halt condition (“Halt project”) because:
Chosen mitigation for WP2: Calibrate anomaly-detection thresholds under the worst-case σ condition (systematic −20%), not the nominal condition. This means the WP2 threshold table will be computed using perturb_sigmas(sigmas, mode='systematic-', magnitude=0.2) as the calibration null. This makes the test conservative: any anomaly that exceeds the worst-case threshold is anomalous regardless of σ-budget errors up to 20%.
The verify_sigmas pre-filter (already implemented) provides a second line of defence: nodes whose declared σ deviates from observed variance by more than 2× are flagged before IC classification is applied.
What is NOT done: The 15% criterion is not relaxed to 20% post-hoc. That would be fitting the criterion to the result. The failure is recorded, the mitigation is recorded, and the decision to proceed is recorded.
| File | Change |
|---|---|
src/ic.py |
Added perturb_sigmas() function; fixed docstring (0.55 → 1.25 bit) |
tests/test_sensitivity.py |
New: 17 tests (8 unit, 6 AIPP sensitivity, 3 threshold stability) |
scripts/fig04_sigma_sensitivity.py |
New: figure generation script |
logbook/figures/fig04_sigma_sensitivity.png |
New: σ-sensitivity box plots |
Entry by U. Warring. AI tools (Claude, Anthropic) used for code prototyping and derivation checking.