Research Lab

Python research stack on Fly.io — walk-forward harness, XGBoost retrain, RL policy, sleeve lifecycle. Where new alpha is born and validated before it touches your capital.

What it is

The Research Lab is a separate Python service (FastAPI + MLflow + Feast + Ray RLlib) deployed on Fly.io — completely isolated from the trading hot path. It is where every new model is trained, every new sleeve is walk-forward validated, and every new strategy candidate earns its right to live.

The trading worker (Cloudflare) never trains anything in-flight. It only consumes promoted artefacts: calibrated probabilities, RL policy lookups, signal weights, sleeve sets. This separation is critical — training compute can never compete with execution latency.

Admin health view shows lab uptime + last successful pull.

What runs there

Walk-Forward Harness — every Monday 05:00 UTC, runs the locked Phase 37 baseline + every candidate sleeve over rolling 180/30 train-test windows across 8 years of data. Slippage-stressed at 0/5/10/20 bps. Writes a new row to wf_snapshots so degradation is detectable within one week.
XGBoost Win-Probability — retrains every Sunday 06:00 UTC on all closed bot_orders. Outputs Platt-calibrated P(win) served via bearer-protected /ml/predict. Registered as the 8th orchestrator signal — shadow-only until it beats baseline 4 weeks running.
RL Policy (PPO) — Ray RLlib trains on real sleeve-return data exported nightly from ClickHouse (Feast sleeve_returns_v1 view → R2 parquet). Champion/challenger setup; challenger only promotes if it beats champion's 28d OOS Sharpe.
Sleeve Lifecycle DAG — genesis → backtest → walk-forward → walk-forward multi-fold → paper → canary → live → retired. Daily evaluation at 06:00 UTC auto-promotes/retires based on Sharpe + DD + sample size.
Strategy Evolution — GA-bred candidates pass three independent stat-gates (DSR, White's Reality Check, PBO) before they're even allowed into the lifecycle pipeline.

How a new model reaches production

Promotion is gated and additive, never destructive:

1. Train in the lab → log metrics to MLflow → bench against locked baseline. 2. Publish as a candidate artefact (sleeve set, model weights, or signal calibration) into production_sleeve_set. 3. Shadow-validate — the trading worker reads the artefact at weight 0, logs every decision and the counter-factual into bot_shadow_decisions / signal_calibration. 4. Gate — 28+ days of shadow telemetry, ≥200 samples, and 4 consecutive weekly WF runs where the new variant beats baseline · 1.05. 5. Promote — admin flips the runtime weight in the orchestrator console. Two-person approval for live-money changes. The Phase 37 baseline keeps trading regardless.

Result: an always-on R&D loop where the lab can fail freely. Capital only ever follows artefacts that survived multi-week shadow validation.

Why it lives on Fly.io (not in the Worker)

Training is heavy, episodic, and tolerant of cold starts. Execution is light, continuous, and intolerant of any cold start. We deliberately split them:

The Cloudflare Worker runs the bot — microseconds matter, no Python, no GPU.
The Fly.io Python lab runs the science — MLflow experiment tracking, Feast feature store, Ray RLlib distributed training, vectorbt research notebooks. Heavy compute when needed, sleeps otherwise.
The bridge is one thing: signed artefacts published into Supabase tables. The Worker only ever reads them. If the lab goes down, trading is completely unaffected.

This is how every serious quant fund splits research and execution. We just made it observable to you.