Drift is one of the most important concepts in MLOps.

Let’s make it very simple and practical.

🧠 What is Drift?

Drift means:

The data today is different from the data used to train the model.

If data changes → model performance drops.

🚨 Why Drift Happens (House Price Example)

Interest rates change
Inflation increases
Government policy changes
New housing schemes
Postcodes expand
COVID-like events

Model trained on 2021 data
Now it's 2026.

Market is different.

That’s drift.

🎯 Types of Drift (Very Important)

There are 4 main types you should know.

1️⃣ Data Drift (Feature Drift / Covariate Drift)

Meaning:

Input features change.

Training data distribution ≠ New data distribution.

🏠 House Example

Training data:

Average price = £250k
Mostly semi-detached
Mostly CV area

New data:

Average price = £400k
More luxury flats
New postcode areas

Even if relationship stays same, inputs changed.

That’s Data Drift.

📊 Example

Training:


Property_Type:
D = 30%
S = 40%
F = 30%

Now:


D = 10%
S = 20%
F = 70%

Big shift.

That’s covariate drift.

2️⃣ Concept Drift

Meaning:

Relationship between input and output changes.

Model assumption becomes outdated.

🏠 House Example

Before COVID:
Location mattered most.

After COVID:
Remote work → size matters more than location.

Same features,
But price relationship changed.

That’s Concept Drift.

🔥 Important:

Concept drift is more dangerous than data drift.

Because:
Even if feature distribution same,
Prediction logic is wrong.

3️⃣ Label Drift (Prior Probability Shift)

Meaning:

Target distribution changes.

🏠 House Example

Training:
Most houses between £200k–£300k.

Now:
Most houses between £400k–£600k.

Price distribution shifted.

That’s label drift.

4️⃣ Prediction Drift

Meaning:

Model predictions distribution changes.

Example:

Training predictions:
Mostly 200k–350k

Now predictions:
Mostly 500k–700k

Something changed.

Even if you don’t know why.

🧠 Simple Summary Table

Drift Type	What Changed?	Example
Data Drift	Inputs	More flats now
Concept Drift	Relationship	Size matters more now
Label Drift	Target	Prices increased
Prediction Drift	Model outputs	Predictions unstable

🎯 Real World Monitoring

In production, we compare:

Training distribution vs Live distribution

Using:

Mean comparison
KS test
PSI (Population Stability Index)
Statistical tests

🧠 Simple Analogy

Imagine you trained chef in India.

Now he is cooking in UK.

Ingredients different.
Spice levels different.
Customer taste different.

If you don’t adjust recipe → customers unhappy.

Drift monitoring = tasting food daily.

🚀 Why Drift Monitoring is Included

Because:

Models degrade over time.

Without monitoring:
You don’t know model is failing.

Drift monitoring tells:

“Time to retrain.”

🎓 Interview Answer

In production systems, data distributions can shift due to market changes, user behavior changes, or economic factors. We monitor for data drift, concept drift, and label drift by comparing training and live distributions. If drift exceeds a threshold, we trigger retraining to maintain model performance.

here’s drift detection in math + PSI + working code + RL drift, in a way you can teach.

1) How to detect drift mathematically

You always compare training distribution $P_{\text{train}}(x)$ vs live distribution $P_{\text{live}}(x)$ .

A) Numeric features (continuous)

1) KS test (Kolmogorov–Smirnov)

Measures the maximum distance between two CDFs.

D = \sup_x \left| F_{\text{train}}(x) - F_{\text{live}}(x) \right|

Output: $D$ + p-value
If p-value is very small → distributions differ → drift.

2) Wasserstein distance (Earth Mover)

“How much mass to move to transform one distribution into the other.”

W(P,Q) = \inf_{\gamma \in \Gamma(P,Q)} \mathbb{E}_{(x,y)\sim\gamma}\left[\lVert x-y\rVert\right]

Higher = more drift (no p-value; you set threshold using history).

3) Mean/Std shift (simple)

\Delta\mu = |\mu_{\text{train}}-\mu_{\text{live}}|,\quad \Delta\sigma = |\sigma_{\text{train}}-\sigma_{\text{live}}|

Fast baseline check (not as reliable alone).

B) Categorical features

1) Chi-square test
Compare category frequencies:

\chi^2 = \sum_{i}\frac{(O_i - E_i)^2}{E_i}

p-value small → drift.

2) Jensen–Shannon divergence (nice for probabilities)

JS(P\|Q)=\frac12 KL(P\|M)+\frac12 KL(Q\|M),\quad M=\frac{P+Q}{2}

0 = same; higher = more drift.

2) PSI calculation (Population Stability Index)

PSI is the most used “business friendly” drift metric.

Step-by-step

Create bins using training (often quantiles).
For each bin $i$ :
- $p_i$ = % of training samples in bin $i$
- $q_i$ = % of live samples in bin $i$
Compute:

PSI = \sum_i (q_i - p_i)\cdot \ln\left(\frac{q_i}{p_i}\right)

PSI interpretation (common rule of thumb)

< 0.10: no / minimal drift
0.10 – 0.25: moderate drift (watch)
> 0.25: significant drift (retrain / investigate)

(These thresholds are heuristics; tune them for your domain.)

3) Real drift detection code example

This snippet detects drift for:

numeric columns: KS test + Wasserstein + PSI
categorical columns: Chi-square + PSI
plus prediction drift


import numpy as np
import pandas as pd
from scipy.stats import ks_2samp, wasserstein_distance, chi2_contingency

def psi_numeric(train: pd.Series, live: pd.Series, bins=10) -> float:
    """PSI for numeric features using quantile bins from train."""
    train = train.dropna()
    live = live.dropna()

    # Quantile bin edges from training
    quantiles = np.linspace(0, 1, bins + 1)
    edges = np.unique(np.quantile(train, quantiles))

    # If not enough unique edges (constant feature), PSI is 0
    if len(edges) < 3:
        return 0.0

    # Bin counts
    train_counts, _ = np.histogram(train, bins=edges)
    live_counts, _ = np.histogram(live, bins=edges)

    # Convert to proportions, add epsilon to avoid log(0)
    eps = 1e-6
    p = (train_counts / max(train_counts.sum(), 1)) + eps
    q = (live_counts / max(live_counts.sum(), 1)) + eps

    return float(np.sum((q - p) * np.log(q / p)))

def psi_categorical(train: pd.Series, live: pd.Series) -> float:
    """PSI for categorical features using category frequencies."""
    train = train.fillna("MISSING")
    live = live.fillna("MISSING")

    cats = sorted(set(train.unique()).union(set(live.unique())))
    train_freq = train.value_counts(normalize=True).reindex(cats, fill_value=0).values
    live_freq = live.value_counts(normalize=True).reindex(cats, fill_value=0).values

    eps = 1e-6
    p = train_freq + eps
    q = live_freq + eps
    return float(np.sum((q - p) * np.log(q / p)))

def drift_report(train_df: pd.DataFrame, live_df: pd.DataFrame,
                 numeric_cols, cat_cols,
                 psi_warn=0.10, psi_alert=0.25, alpha=0.05) -> pd.DataFrame:
    rows = []

    # Numeric drift
    for col in numeric_cols:
        t = train_df[col]
        l = live_df[col]

        ks = ks_2samp(t.dropna(), l.dropna())
        w = wasserstein_distance(t.dropna(), l.dropna())
        psi = psi_numeric(t, l, bins=10)

        status = "OK"
        if psi >= psi_alert or ks.pvalue < alpha:
            status = "ALERT"
        elif psi >= psi_warn:
            status = "WARN"

        rows.append({
            "feature": col,
            "type": "numeric",
            "ks_stat": ks.statistic,
            "ks_pvalue": ks.pvalue,
            "wasserstein": w,
            "psi": psi,
            "status": status
        })

    # Categorical drift
    for col in cat_cols:
        t = train_df[col].fillna("MISSING")
        l = live_df[col].fillna("MISSING")

        cats = sorted(set(t.unique()).union(set(l.unique())))
        t_counts = t.value_counts().reindex(cats, fill_value=0)
        l_counts = l.value_counts().reindex(cats, fill_value=0)

        # Chi-square test on contingency table
        contingency = np.vstack([t_counts.values, l_counts.values])
        chi2, p, _, _ = chi2_contingency(contingency)

        psi = psi_categorical(t, l)

        status = "OK"
        if psi >= psi_alert or p < alpha:
            status = "ALERT"
        elif psi >= psi_warn:
            status = "WARN"

        rows.append({
            "feature": col,
            "type": "categorical",
            "chi2": chi2,
            "chi2_pvalue": p,
            "psi": psi,
            "status": status
        })

    return pd.DataFrame(rows).sort_values(["status", "psi"], ascending=[True, False])

# --- Example usage ---
# train_df = pd.read_csv("data/processed/train_features.csv")
# live_df  = pd.read_csv("data/processed/live_features.csv")

# numeric_cols = ["Year", "Month", "Rooms", "AreaSqm"]
# cat_cols = ["Postcode_Area", "Property_Type"]

# report = drift_report(train_df, live_df, numeric_cols, cat_cols)
# print(report)

# Prediction drift example:
# yhat_train = model.predict(X_train)
# yhat_live  = model.predict(X_live)
# print("Prediction PSI:", psi_numeric(pd.Series(yhat_train), pd.Series(yhat_live)))

How to use it in your project

Save the report as reports/drift_report.csv
If any ALERT → trigger retraining pipeline

4) Drift in Reinforcement Learning (RL)

RL drift is trickier because the agent affects the data.

Key idea

In supervised ML, data comes from the world.
In RL, data comes from (world + policy behavior).

So you can have drift from:

A) Environment drift

The environment dynamics change:

P(s'|s,a) \text{ changes}

Example:

In trading RL: market regime changes (bull → bear)
In robotics: floor friction changes
In games: opponent strategy changes

B) Reward drift

The reward function changes:

R(s,a) \text{ changes}

Example:

New transaction fees in trading
New business KPI definitions

C) Observation drift

Sensors / features shift:

P(s) \text{ changes}

Example:

New input signals, missing features, scaled differently

D) Policy-induced drift (most unique to RL)

As the agent improves, it visits new states:

Early training: explores randomly → sees broad states
Later: focuses on profitable states → state distribution shifts

d^\pi(s) \text{ changes when } \pi \text{ changes}

So even without environment change, your “data distribution” changes.

Practical drift monitoring in RL

Monitor:

State distribution drift: PSI/JS on key state features
Action distribution drift: are chosen actions changing?
Reward distribution drift: mean/variance of episodic returns
Performance drift: rolling average return vs baseline
Regime detection (trading): volatility / correlation shifts

When to retrain in RL

Live return drops below threshold for N episodes
Action distribution collapses (too repetitive)
State distribution moves far from training replay buffer stats

Quick “teach others” summary

Drift = training data distribution ≠ live distribution
Detect drift using KS/Wasserstein (numeric), Chi-square/JS (categorical), and PSI (business-friendly)
In RL, drift can be environment, reward, observation, and policy-induced

Now we make this intuitive + unforgettable.

We will cover:

1️⃣ Drift analogy
2️⃣ What is CDF (very simple)
3️⃣ How to remember KS / PSI / RL drift
4️⃣ RL drift analogies

🌊 1️⃣ What is Drift? (Deep Analogy)

Drift = River Changed Direction

You trained your model when the river was flowing east.

Now the river flows west.

Your boat (model) still assumes east.

It crashes.

That change in river flow = drift.

🏠 House Price Analogy

Training data (2021):

Average house price = £250k
Mostly semi-detached
Low interest rates

Live data (2026):

Average price = £450k
More flats
High interest rates

The “market climate” changed.

That’s drift.

📊 2️⃣ What is CDF? (Super Simple)

CDF = Cumulative Distribution Function

Forget the scary name.

It means:

For a value X, how many data points are below X?

🧠 Analogy — Height Example

Suppose you measure heights of 100 people.

CDF at 170cm means:

What percentage of people are ≤ 170cm?

If 60 people are below 170cm:

CDF(170) = 0.60

📈 Visual Intuition

Imagine stacking sand from smallest to biggest values.

CDF shows how full the bucket is at any height.

🎯 Why KS Test Uses CDF

KS test compares:

Training CDF
vs
Live CDF

It checks:

What is the maximum gap between the two curves?

If gap is big → drift.

🧠 Simple Way to Remember KS

K = Kolmogorov
S = Super gap

KS = measure biggest gap between cumulative curves.

📊 3️⃣ PSI Analogy (Business Friendly)

PSI = Population Stability Index

🧠 Analogy — Exam Grades

Training year:

30% A
40% B
30% C

This year:

10% A
20% B
70% C

Distribution changed.

PSI measures how much shift happened.

🧠 Easy Memory Rule

PSI = “Percentage Shift Indicator”

Big PSI → big shift.

🏷️ Numeric vs Categorical Drift

Type	Tool	Easy Analogy
Numeric	KS	Compare cumulative heights
Numeric	Wasserstein	How much sand to move
Categorical	Chi-square	Compare category counts
Business	PSI	Compare bucket percentages

🤖 4️⃣ Drift in Reinforcement Learning (Analogy)

RL is different.

Agent interacts with environment.

🎮 Analogy — Video Game Player

You trained agent to play:

Game version 1.

Suddenly:

New game update.
Enemies stronger.
Rewards reduced.

Now policy fails.

That’s Environment Drift.

💰 Trading RL Example

Agent trained during bull market.

Market becomes bearish.

Patterns changed.

Reward distribution changed.

That’s drift.

🎯 Types of RL Drift (with simple analogies)

Type	Analogy
Environment drift	Game update
Reward drift	Salary structure changed
Observation drift	Sensor broken
Policy-induced drift	Player changed strategy

🧠 Policy-Induced Drift (Most Important)

This is unique to RL.

As agent learns:

It visits different states.

So state distribution changes.

Example:

Beginner driver:
Drives everywhere randomly.

Expert driver:
Avoids bad roads.

So environment data changes.

🧠 Memory Trick For Everything

Drift Types Memory:

D C L P

D → Data drift
C → Concept drift
L → Label drift
P → Policy drift

🧠 Simple Final Definitions

CDF:

Running percentage below a value.

KS:

Biggest gap between cumulative curves.

PSI:

Percentage distribution shift.

Drift:

Past world ≠ Present world.

🔥 One-Line Summary To Teach Others

Drift occurs when the statistical properties of live data differ from training data. We detect numeric drift using KS or Wasserstein distance by comparing CDFs, categorical drift using Chi-square or PSI, and in reinforcement learning, drift can occur in environment dynamics, rewards, observations, or due to evolving policies.

Thursday, February 19, 2026

Drifting in Machine Learning

🧠 What is Drift?

🚨 Why Drift Happens (House Price Example)

🎯 Types of Drift (Very Important)

1️⃣ Data Drift (Feature Drift / Covariate Drift)

Meaning:

🏠 House Example

📊 Example

2️⃣ Concept Drift

Meaning:

🏠 House Example

🔥 Important:

3️⃣ Label Drift (Prior Probability Shift)

Meaning:

🏠 House Example

4️⃣ Prediction Drift

Meaning:

🧠 Simple Summary Table

🎯 Real World Monitoring

🧠 Simple Analogy

🚀 Why Drift Monitoring is Included

🎓 Interview Answer

1) How to detect drift mathematically

A) Numeric features (continuous)

B) Categorical features

2) PSI calculation (Population Stability Index)

Step-by-step

PSI interpretation (common rule of thumb)

3) Real drift detection code example

How to use it in your project

4) Drift in Reinforcement Learning (RL)

Key idea

A) Environment drift

B) Reward drift

C) Observation drift

D) Policy-induced drift (most unique to RL)

Practical drift monitoring in RL

When to retrain in RL

Quick “teach others” summary

🌊 1️⃣ What is Drift? (Deep Analogy)

Drift = River Changed Direction

🏠 House Price Analogy

📊 2️⃣ What is CDF? (Super Simple)

🧠 Analogy — Height Example

📈 Visual Intuition

🎯 Why KS Test Uses CDF

🧠 Simple Way to Remember KS

📊 3️⃣ PSI Analogy (Business Friendly)

🧠 Analogy — Exam Grades

🧠 Easy Memory Rule

🏷️ Numeric vs Categorical Drift

🤖 4️⃣ Drift in Reinforcement Learning (Analogy)

🎮 Analogy — Video Game Player

💰 Trading RL Example

🎯 Types of RL Drift (with simple analogies)

🧠 Policy-Induced Drift (Most Important)

🧠 Memory Trick For Everything

D C L P

🧠 Simple Final Definitions

🔥 One-Line Summary To Teach Others

Configuring Java and Maven