Thursday, February 19, 2026

Drifting in Machine Learning

 Drift is one of the most important concepts in MLOps.

Let’s make it very simple and practical.


๐Ÿง  What is Drift?

Drift means:

The data today is different from the data used to train the model.

If data changes → model performance drops.


๐Ÿšจ Why Drift Happens (House Price Example)

  • Interest rates change

  • Inflation increases

  • Government policy changes

  • New housing schemes

  • Postcodes expand

  • COVID-like events

Model trained on 2021 data
Now it's 2026.

Market is different.

That’s drift.


๐ŸŽฏ Types of Drift (Very Important)

There are 4 main types you should know.


1️⃣ Data Drift (Feature Drift / Covariate Drift)

Meaning:

Input features change.

Training data distribution ≠ New data distribution.


๐Ÿ  House Example

Training data:

  • Average price = £250k

  • Mostly semi-detached

  • Mostly CV area

New data:

  • Average price = £400k

  • More luxury flats

  • New postcode areas

Even if relationship stays same, inputs changed.

That’s Data Drift.


๐Ÿ“Š Example

Training:

Property_Type: D = 30% S = 40% F = 30%

Now:

D = 10% S = 20% F = 70%

Big shift.

That’s covariate drift.


2️⃣ Concept Drift

Meaning:

Relationship between input and output changes.

Model assumption becomes outdated.


๐Ÿ  House Example

Before COVID:
Location mattered most.

After COVID:
Remote work → size matters more than location.

Same features,
But price relationship changed.

That’s Concept Drift.


๐Ÿ”ฅ Important:

Concept drift is more dangerous than data drift.

Because:
Even if feature distribution same,
Prediction logic is wrong.


3️⃣ Label Drift (Prior Probability Shift)

Meaning:

Target distribution changes.


๐Ÿ  House Example

Training:
Most houses between £200k–£300k.

Now:
Most houses between £400k–£600k.

Price distribution shifted.

That’s label drift.


4️⃣ Prediction Drift

Meaning:

Model predictions distribution changes.


Example:

Training predictions:
Mostly 200k–350k

Now predictions:
Mostly 500k–700k

Something changed.

Even if you don’t know why.


๐Ÿง  Simple Summary Table

Drift TypeWhat Changed?Example
Data DriftInputsMore flats now
Concept DriftRelationshipSize matters more now
Label DriftTargetPrices increased
Prediction DriftModel outputsPredictions unstable

๐ŸŽฏ Real World Monitoring

In production, we compare:

Training distribution vs Live distribution

Using:

  • Mean comparison

  • KS test

  • PSI (Population Stability Index)

  • Statistical tests


๐Ÿง  Simple Analogy

Imagine you trained chef in India.

Now he is cooking in UK.

Ingredients different.
Spice levels different.
Customer taste different.

If you don’t adjust recipe → customers unhappy.

Drift monitoring = tasting food daily.


๐Ÿš€ Why Drift Monitoring is Included

Because:

Models degrade over time.

Without monitoring:
You don’t know model is failing.

Drift monitoring tells:

“Time to retrain.”


๐ŸŽ“ Interview Answer

In production systems, data distributions can shift due to market changes, user behavior changes, or economic factors. We monitor for data drift, concept drift, and label drift by comparing training and live distributions. If drift exceeds a threshold, we trigger retraining to maintain model performance.


here’s drift detection in math + PSI + working code + RL drift, in a way you can teach.


1) How to detect drift mathematically

You always compare training distribution Ptrain(x)P_{\text{train}}(x) vs live distribution Plive(x)P_{\text{live}}(x).

A) Numeric features (continuous)

1) KS test (Kolmogorov–Smirnov)

  • Measures the maximum distance between two CDFs.

D=supxFtrain(x)Flive(x)D = \sup_x \left| F_{\text{train}}(x) - F_{\text{live}}(x) \right|
  • Output: DD + p-value

  • If p-value is very small → distributions differ → drift.

2) Wasserstein distance (Earth Mover)

  • “How much mass to move to transform one distribution into the other.”

W(P,Q)=infฮณฮ“(P,Q)E(x,y)ฮณ[xy]W(P,Q) = \inf_{\gamma \in \Gamma(P,Q)} \mathbb{E}_{(x,y)\sim\gamma}\left[\lVert x-y\rVert\right]
  • Higher = more drift (no p-value; you set threshold using history).

3) Mean/Std shift (simple)

ฮ”ฮผ=ฮผtrainฮผlive,ฮ”ฯƒ=ฯƒtrainฯƒlive\Delta\mu = |\mu_{\text{train}}-\mu_{\text{live}}|,\quad \Delta\sigma = |\sigma_{\text{train}}-\sigma_{\text{live}}|
  • Fast baseline check (not as reliable alone).


B) Categorical features

1) Chi-square test
Compare category frequencies:

ฯ‡2=i(OiEi)2Ei\chi^2 = \sum_{i}\frac{(O_i - E_i)^2}{E_i}
  • p-value small → drift.

2) Jensen–Shannon divergence (nice for probabilities)

JS(PQ)=12KL(PM)+12KL(QM),M=P+Q2JS(P\|Q)=\frac12 KL(P\|M)+\frac12 KL(Q\|M),\quad M=\frac{P+Q}{2}
  • 0 = same; higher = more drift.


2) PSI calculation (Population Stability Index)

PSI is the most used “business friendly” drift metric.

Step-by-step

  1. Create bins using training (often quantiles).

  2. For each bin ii:

    • pip_i = % of training samples in bin ii

    • qiq_i = % of live samples in bin ii

  3. Compute:

PSI=i(qipi)ln(qipi)PSI = \sum_i (q_i - p_i)\cdot \ln\left(\frac{q_i}{p_i}\right)

PSI interpretation (common rule of thumb)

  • < 0.10: no / minimal drift

  • 0.10 – 0.25: moderate drift (watch)

  • > 0.25: significant drift (retrain / investigate)

(These thresholds are heuristics; tune them for your domain.)


3) Real drift detection code example

This snippet detects drift for:

  • numeric columns: KS test + Wasserstein + PSI

  • categorical columns: Chi-square + PSI

  • plus prediction drift

import numpy as np import pandas as pd from scipy.stats import ks_2samp, wasserstein_distance, chi2_contingency def psi_numeric(train: pd.Series, live: pd.Series, bins=10) -> float: """PSI for numeric features using quantile bins from train.""" train = train.dropna() live = live.dropna() # Quantile bin edges from training quantiles = np.linspace(0, 1, bins + 1) edges = np.unique(np.quantile(train, quantiles)) # If not enough unique edges (constant feature), PSI is 0 if len(edges) < 3: return 0.0 # Bin counts train_counts, _ = np.histogram(train, bins=edges) live_counts, _ = np.histogram(live, bins=edges) # Convert to proportions, add epsilon to avoid log(0) eps = 1e-6 p = (train_counts / max(train_counts.sum(), 1)) + eps q = (live_counts / max(live_counts.sum(), 1)) + eps return float(np.sum((q - p) * np.log(q / p))) def psi_categorical(train: pd.Series, live: pd.Series) -> float: """PSI for categorical features using category frequencies.""" train = train.fillna("MISSING") live = live.fillna("MISSING") cats = sorted(set(train.unique()).union(set(live.unique()))) train_freq = train.value_counts(normalize=True).reindex(cats, fill_value=0).values live_freq = live.value_counts(normalize=True).reindex(cats, fill_value=0).values eps = 1e-6 p = train_freq + eps q = live_freq + eps return float(np.sum((q - p) * np.log(q / p))) def drift_report(train_df: pd.DataFrame, live_df: pd.DataFrame, numeric_cols, cat_cols, psi_warn=0.10, psi_alert=0.25, alpha=0.05) -> pd.DataFrame: rows = [] # Numeric drift for col in numeric_cols: t = train_df[col] l = live_df[col] ks = ks_2samp(t.dropna(), l.dropna()) w = wasserstein_distance(t.dropna(), l.dropna()) psi = psi_numeric(t, l, bins=10) status = "OK" if psi >= psi_alert or ks.pvalue < alpha: status = "ALERT" elif psi >= psi_warn: status = "WARN" rows.append({ "feature": col, "type": "numeric", "ks_stat": ks.statistic, "ks_pvalue": ks.pvalue, "wasserstein": w, "psi": psi, "status": status }) # Categorical drift for col in cat_cols: t = train_df[col].fillna("MISSING") l = live_df[col].fillna("MISSING") cats = sorted(set(t.unique()).union(set(l.unique()))) t_counts = t.value_counts().reindex(cats, fill_value=0) l_counts = l.value_counts().reindex(cats, fill_value=0) # Chi-square test on contingency table contingency = np.vstack([t_counts.values, l_counts.values]) chi2, p, _, _ = chi2_contingency(contingency) psi = psi_categorical(t, l) status = "OK" if psi >= psi_alert or p < alpha: status = "ALERT" elif psi >= psi_warn: status = "WARN" rows.append({ "feature": col, "type": "categorical", "chi2": chi2, "chi2_pvalue": p, "psi": psi, "status": status }) return pd.DataFrame(rows).sort_values(["status", "psi"], ascending=[True, False]) # --- Example usage --- # train_df = pd.read_csv("data/processed/train_features.csv") # live_df = pd.read_csv("data/processed/live_features.csv") # numeric_cols = ["Year", "Month", "Rooms", "AreaSqm"] # cat_cols = ["Postcode_Area", "Property_Type"] # report = drift_report(train_df, live_df, numeric_cols, cat_cols) # print(report) # Prediction drift example: # yhat_train = model.predict(X_train) # yhat_live = model.predict(X_live) # print("Prediction PSI:", psi_numeric(pd.Series(yhat_train), pd.Series(yhat_live)))

How to use it in your project

  • Save the report as reports/drift_report.csv

  • If any ALERT → trigger retraining pipeline


4) Drift in Reinforcement Learning (RL)

RL drift is trickier because the agent affects the data.

Key idea

In supervised ML, data comes from the world.
In RL, data comes from (world + policy behavior).

So you can have drift from:

A) Environment drift

The environment dynamics change:

P(ss,a) changesP(s'|s,a) \text{ changes}

Example:

  • In trading RL: market regime changes (bull → bear)

  • In robotics: floor friction changes

  • In games: opponent strategy changes

B) Reward drift

The reward function changes:

R(s,a) changesR(s,a) \text{ changes}

Example:

  • New transaction fees in trading

  • New business KPI definitions

C) Observation drift

Sensors / features shift:

P(s) changesP(s) \text{ changes}

Example:

  • New input signals, missing features, scaled differently

D) Policy-induced drift (most unique to RL)

As the agent improves, it visits new states:

  • Early training: explores randomly → sees broad states

  • Later: focuses on profitable states → state distribution shifts

dฯ€(s) changes when ฯ€ changesd^\pi(s) \text{ changes when } \pi \text{ changes}

So even without environment change, your “data distribution” changes.

Practical drift monitoring in RL

Monitor:

  1. State distribution drift: PSI/JS on key state features

  2. Action distribution drift: are chosen actions changing?

  3. Reward distribution drift: mean/variance of episodic returns

  4. Performance drift: rolling average return vs baseline

  5. Regime detection (trading): volatility / correlation shifts

When to retrain in RL

  • Live return drops below threshold for N episodes

  • Action distribution collapses (too repetitive)

  • State distribution moves far from training replay buffer stats


Quick “teach others” summary

  • Drift = training data distribution ≠ live distribution

  • Detect drift using KS/Wasserstein (numeric), Chi-square/JS (categorical), and PSI (business-friendly)

  • In RL, drift can be environment, reward, observation, and policy-induced

Now we make this intuitive + unforgettable.

We will cover:

1️⃣ Drift analogy
2️⃣ What is CDF (very simple)
3️⃣ How to remember KS / PSI / RL drift
4️⃣ RL drift analogies


๐ŸŒŠ 1️⃣ What is Drift? (Deep Analogy)

Drift = River Changed Direction

You trained your model when the river was flowing east.

Now the river flows west.

Your boat (model) still assumes east.

It crashes.

That change in river flow = drift.


๐Ÿ  House Price Analogy

Training data (2021):

  • Average house price = £250k

  • Mostly semi-detached

  • Low interest rates

Live data (2026):

  • Average price = £450k

  • More flats

  • High interest rates

The “market climate” changed.

That’s drift.


๐Ÿ“Š 2️⃣ What is CDF? (Super Simple)

CDF = Cumulative Distribution Function

Forget the scary name.

It means:

For a value X, how many data points are below X?


๐Ÿง  Analogy — Height Example

Suppose you measure heights of 100 people.

CDF at 170cm means:

What percentage of people are ≤ 170cm?

If 60 people are below 170cm:

CDF(170) = 0.60


๐Ÿ“ˆ Visual Intuition

Imagine stacking sand from smallest to biggest values.

CDF shows how full the bucket is at any height.


๐ŸŽฏ Why KS Test Uses CDF

KS test compares:

Training CDF
vs
Live CDF

It checks:

What is the maximum gap between the two curves?

If gap is big → drift.


๐Ÿง  Simple Way to Remember KS

K = Kolmogorov
S = Super gap

KS = measure biggest gap between cumulative curves.


๐Ÿ“Š 3️⃣ PSI Analogy (Business Friendly)

PSI = Population Stability Index


๐Ÿง  Analogy — Exam Grades

Training year:

  • 30% A

  • 40% B

  • 30% C

This year:

  • 10% A

  • 20% B

  • 70% C

Distribution changed.

PSI measures how much shift happened.


๐Ÿง  Easy Memory Rule

PSI = “Percentage Shift Indicator”

Big PSI → big shift.


๐Ÿท️ Numeric vs Categorical Drift

TypeToolEasy Analogy
NumericKSCompare cumulative heights
NumericWassersteinHow much sand to move
CategoricalChi-squareCompare category counts
BusinessPSICompare bucket percentages

๐Ÿค– 4️⃣ Drift in Reinforcement Learning (Analogy)

RL is different.

Agent interacts with environment.


๐ŸŽฎ Analogy — Video Game Player

You trained agent to play:

Game version 1.

Suddenly:

  • New game update.

  • Enemies stronger.

  • Rewards reduced.

Now policy fails.

That’s Environment Drift.


๐Ÿ’ฐ Trading RL Example

Agent trained during bull market.

Market becomes bearish.

Patterns changed.

Reward distribution changed.

That’s drift.


๐ŸŽฏ Types of RL Drift (with simple analogies)

TypeAnalogy
Environment driftGame update
Reward driftSalary structure changed
Observation driftSensor broken
Policy-induced driftPlayer changed strategy

๐Ÿง  Policy-Induced Drift (Most Important)

This is unique to RL.

As agent learns:

It visits different states.

So state distribution changes.

Example:

Beginner driver:
Drives everywhere randomly.

Expert driver:
Avoids bad roads.

So environment data changes.


๐Ÿง  Memory Trick For Everything

Drift Types Memory:

D C L P

D → Data drift
C → Concept drift
L → Label drift
P → Policy drift


๐Ÿง  Simple Final Definitions

CDF:

Running percentage below a value.

KS:

Biggest gap between cumulative curves.

PSI:

Percentage distribution shift.

Drift:

Past world ≠ Present world.


๐Ÿ”ฅ One-Line Summary To Teach Others

Drift occurs when the statistical properties of live data differ from training data. We detect numeric drift using KS or Wasserstein distance by comparing CDFs, categorical drift using Chi-square or PSI, and in reinforcement learning, drift can occur in environment dynamics, rewards, observations, or due to evolving policies.

 

Configuring Java and Maven

  1️⃣ Configure Java Environment Open the Java environment file. sudo vi /etc/profile.d/java.sh Add these lines inside the file: expor...