Methodology
AI Portfolio Stress Testing
This platform combines machine learning forecasting, Hidden Markov Model regime detection,
Mean-Variance Optimisation, and SHAP explainability to deliver a fully integrated
risk analytics workflow. The pipeline processes macroeconomic and market data end-to-end
— from raw ingestion through to plain-English portfolio narratives.
ElasticNet
XGBoost
HMM Regime
Ledoit-Wolf
Max-Sharpe MVO
SHAP
scikit-learn
FastAPI
pandas / numpy
Chart.js
01
Data Ingestion
Macroeconomic & Market Data Pipeline
Raw data is sourced from FRED (Federal Reserve), ECB, and Yahoo Finance.
The pipeline collects US and European macroeconomic indicators alongside
asset price series for S&P 500, Nasdaq 100, Gold, and Bitcoin.
- US macro: Fed Funds Rate, CPI YoY, industrial production, unemployment, yield spreads
- European macro: ECB policy rate, Euro-area CPI, HICP
- Market prices: SPX, NDX, GLD, BTC-USD — monthly frequency
- Output: unified CSV with forward-filled gaps and date-aligned index
pandas_datareader
yfinance
FRED API
↓
02
Feature Engineering
Macro & Technical Feature Construction
Transforms raw series into predictive features used by the ML models.
- Yield curve spread: 10Y − 2Y (inversion indicator)
- Momentum: 1-, 3-, 6-month rolling returns for each asset
- Volatility: 3- and 6-month rolling standard deviation
- Macro lags: 1–3 month lags on CPI, Fed Funds, industrial production
- Cross-asset ratios: gold/equity relative strength
feature_t = f(macro_{t-1..t-3}, price_momentum_{1m,3m,6m}, rolling_vol_{3m,6m})
↓
03
Regime Detection
Hidden Markov Model — Market State Classification
A Gaussian Hidden Markov Model classifies monthly market states.
States are labelled by their return/volatility characteristics.
- 6 regimes: Bull Trend, Low-Vol Bull, Recovery, High-Vol, Credit Stress, Bear Market
- Observations: monthly returns + rolling volatility for SPX, NDX, Gold
- Parameters estimated via Baum-Welch EM algorithm
- Current regime influences portfolio tilt weights in Phase 7
P(regime_t | regime_{t-1}) via transition matrix A
observation prob: N(μ_k, Σ_k) per state k
hmmlearn
GaussianHMM
Baum-Welch
↓
04–05
Macro Model Training
Macro Index Construction & Asset Model Training
Two complementary steps. Phase 4 builds composite macro indices via PCA;
Phase 5 trains asset-level linear models as a warm-up for Phase 6.
- PCA on macro block → growth, inflation, financial-conditions indices
- OLS regressions per asset; diagnostics: Durbin-Watson, VIF, residual ACF
PCA
OLS
statsmodels
↓
06
ML Ensemble Forecasting
ElasticNet + XGBoost Return Forecasting
Two-model ensemble trained per asset on lagged macro + technical features.
A two-pass training strategy handles BTC's shorter data history.
- ElasticNet: L1 + L2 regularisation — α·ρ·‖β‖₁ + α·(1−ρ)/2·‖β‖₂²
- XGBoost: gradient-boosted trees — captures non-linear macro interactions
- Walk-forward cross-validation (5 folds) prevents look-ahead bias
- Ensemble: simple average of ElasticNet and XGBoost predictions
- Two-pass: SPX/NDX/Gold trained on full history; BTC trained on 2010+ subset
ŷ_t = 0.5 · ŷ_EN(X_{t-1}) + 0.5 · ŷ_XGB(X_{t-1})
ElasticNet
XGBoost
walk-forward CV
↓
06.1
Hyperparameter Tuning
Grid Search + Rolling Validation
Extended hyperparameter search to find optimal model configurations.
- ElasticNet: 45 combinations of α ∈ {0.001…1.0} × l1_ratio ∈ {0.1…0.9}
- XGBoost: 288 combinations of max_depth, n_estimators, learning_rate, subsample, colsample
- Rolling validation: 9 expanding-window folds (min 36 months train, 6 months test)
- Selection criterion: mean RMSE across rolling folds (lower = better)
↓
07
Portfolio Optimisation
Mean-Variance Optimisation + Regime Tilts + Stress Testing
Uses ensemble expected returns and Ledoit-Wolf shrinkage covariance to solve
for the Maximum-Sharpe tangency portfolio, then applies regime-based tilts.
- Ledoit-Wolf shrinkage: Σ̂ = (1−δ)·S + δ·μ̂·I — reduces estimation error
- Max-Sharpe MVO: max (μᵀw − r_f) / √(wᵀΣw) subject to Σwᵢ = 1, wᵢ ≥ 0
- Regime tilts: ±5–15% weight adjustment based on HMM regime
- 20 stress scenarios covering GFC, COVID crash, dot-com, rate shock, etc.
- Scenario returns: weighted sum of asset-level historical episode returns
max_w (μᵀw) / √(wᵀΣ̂w)
s.t. Σ wᵢ = 1, wᵢ ≥ 0
REGIME_TILT: w_adjusted = clip(w_base + tilt_Δ, 0, 1), renormalised
Ledoit-Wolf
Max-Sharpe
scipy.optimize
SLSQP
↓
08
Explainability
SHAP Feature Attribution + ElasticNet Coefficients
Decomposes model predictions into feature-level contributions for transparency.
- XGBoost: TreeSHAP — exact SHAP values, O(TLD) complexity
- ElasticNet: LinearSHAP — analytical attribution = β_i · (x_i − E[x_i])
- Portfolio exposure: Σᵢ(wᵢ · mean|SHAP|ᵢ_f) — weighted factor importance
- Top features identified per asset; used by NarrativeEngine for plain-English text
Portfolio factor exposure_f = Σ_i weight_i · mean_k |SHAP_i(x_k, f)|
SHAP
TreeExplainer
LinearExplainer
↓
09–10
API & Dashboard
FastAPI REST Layer + Server-Rendered Analytics Dashboard
Phase 9 exposes all pipeline outputs as REST endpoints with a NarrativeEngine
for plain-English explanations. Phase 10 delivers this full-screen dashboard.
- FastAPI with Pydantic validation + CORS middleware
- On-the-fly portfolio analysis: custom weights → Sharpe, VaR, risk contributions
- NarrativeEngine: 6 narrative blocks (composition, regime, dominant factor, stress, diversification, hedge)
- Jinja2 server-rendered templates — zero client-side framework dependency
- Chart.js 4.4 — donut, waterfall, drawdown, factor, contribution charts
FastAPI
Jinja2
Chart.js 4.4
Uvicorn
Risk Metrics Glossary
| Metric | Formula / Definition | Interpretation |
|---|---|---|
| VaR 95% | 5th percentile of return distribution | Maximum monthly loss not exceeded 95% of the time |
| CVaR 95% | E[r | r < VaR₉₅] | Expected loss in the worst 5% of outcomes (tail risk) |
| Sharpe Ratio | (μ_p − r_f) / σ_p | Risk-adjusted return per unit of total volatility |
| Diversification Ratio | Σ(w_i · σ_i) / σ_portfolio | How much diversification reduces portfolio volatility; >1 is good |
| Max Drawdown | min(cumret_t / max(cumret_{0..t}) − 1) | Worst peak-to-trough loss over the observed period |
| Ledoit-Wolf | Σ̂ = (1−δ)·S + δ·μ̂·I | Shrinkage estimator; δ chosen analytically to minimise MSE |
Scope note:
The core portfolio universe is limited to SPX, NDX, Gold and BTC.
FTSE 100 and FX series (EUR/USD, GBP/USD, DXY) are ingested as macro
conditioning variables and feature inputs only; they do not currently
enter the portfolio allocation or stress test return calculations.
Extending the investable universe to include FTSE or FX instruments
would require additional asset-level sensitivity models and covariance
matrix expansion.