AI Execution Engine

DRL agent, Smart Order Router, adaptive TWAP, and learned slippage.

Deep Reinforcement Learning (DRL) Execution Agent

The DRL agent optimizes how orders are executed — not what to trade, but the optimal way to slice and time each order:

State space (6 dimensions): - Current bid-ask spread (bps) - Order book depth imbalance [-1, 1] - Trade flow imbalance [-1, 1] - Time remaining in execution window [0, 1] - Percentage already filled [0, 1] - Recent 5-min realized volatility (bps)

Action space: - Slice percentage (0-100% of remaining quantity) - Aggression level (passive / mid / aggressive) - Wait time before next slice (0-5000ms)

Reward: -slippage_bps - market_impact_bps + time_bonus

The model is trained in the research lab (Python/Ray RLlib) and exported as a lookup table for the TypeScript runtime. Fallback: if no trained model is available, uses the Predictive Spread Model (EWMA-based).

Smart Order Router (SOR) — multi-venue splitting

For orders above $5,000 notional, the SOR splits execution across multiple exchanges simultaneously:

How it works: - Probes top-of-book on every venue where the user has credentials - Drops venues whose effective price is worse than best by > 8 bps - Allocates proportionally to (1 / spread_bps) — tighter spread = more allocation - Drops legs below $500; redistributes to the best leg - Maximum 4 parallel legs

Result: Better average fill price, reduced market impact, and natural diversification of exchange risk. Each leg independently uses the full TWAP / maker-first / slice-loop machinery.

Adaptive TWAP (Phase 76)

The Time-Weighted Average Price (TWAP) slicer learns from historical fills:

Reads bot_orders.slippage_bps over the last 30 days
Buckets by (symbol, hour-of-day UTC, notional size: small/mid/large)
Returns median slippage per bucket
Adjusts sliceCount and delayMs based on learned patterns

Fail-open design: sparse buckets (< 5 samples) return null and fall back to static defaults. The adaptive layer can only improve execution, never make it worse than baseline.

A guardrail (Phase 76b) continuously compares adaptive TWAP performance against the baseline and alerts admins if the adaptive path underperforms.

Learned Slippage Venue Router (Phase 80)

Extends the Smart Order Router with a slippage-aware score per venue:

effectiveCostBps = priceWorseBps(vs best) + expectedSlippageBps(venue)

Expected slippage is the notional-weighted median of historical fills per (exchange, symbol, hour_bucket) over 30 days. Cold venues fall back to global median, then to a static 8bps default.

This means the router learns which exchanges actually deliver better fills for specific symbols at specific times — not just which has the tightest quoted spread.