AI Orchestrator Evolution (Phase O1 → O4)
How we went from one model to a 9-signal Platt-calibrated ensemble — in four behaviour-neutral phases.
The four phases
The orchestrator was built in four phases, all behaviour-neutral on ship day. None of these flipped any allocation until shadow data proved the uplift:
- Phase O1 (May 2026) — DecisionContext type + ClickHouse
decision_tracesdual-write from both branches of the engine. Pure foundation. - Phase O2 — Platt calibration + 7-signal ensemble registry.
signal_calibrationtable + admin orchestrator console. - Phase O3 — Selective AI-overlay size multiplier (0.5–1.0×). Triple-gated:
wf_v2_gate_open+ per-user opt-in + global kill-switch. - Phase O4 — RL policy added as the 8th signal. Sampled via Python lab champion/challenger. Admin RL-policy console.
XGBoost (Phase 74) became the 9th. Manus shadow signals are the 10th-12th.
How a new signal flips on
Every new signal walks the same gauntlet:
1. Ship in shadow — weight = 0, never affects trading. Score written to decision_traces.
2. Calibrate — weekly Platt fitter learns the raw-score → probability mapping per regime.
3. Validate — shadow-validation tracker counts samples (≥200 for min, ≥1000 for full) over rolling windows (≥28d / ≥56d).
4. A/B in WF — for run-based gates, 4 consecutive walk-forward weeks Overlay ≥ Baseline · 1.05 with agreement ≥ 0.5.
5. Flip weight — operator bumps the weight in orchestrator_signal_weights. Telegram + audit trail.
This is the same path the next signal will walk. No exceptions, no operator overrides for "this one I just believe in."
Why behaviour-neutral ships
Locked-baseline behaviour-neutral ships are a hard policy:
- Phase 37 baseline keeps trading regardless of orchestrator drama
- New signals can't blow up your account during their first month of telemetry
- Every flip is reversible — if a signal turns sour after live promotion, weight goes back to 0 and you're back to baseline within one tick
This is why we can ship aggressively. The blast radius of a wrong signal is bounded.