Drift is one of the most important concepts in MLOps.
Let’s make it very simple and practical.
๐ง What is Drift?
Drift means:
The data today is different from the data used to train the model.
If data changes → model performance drops.
๐จ Why Drift Happens (House Price Example)
-
Interest rates change
-
Inflation increases
-
Government policy changes
-
New housing schemes
-
Postcodes expand
-
COVID-like events
Model trained on 2021 data
Now it's 2026.
Market is different.
That’s drift.
๐ฏ Types of Drift (Very Important)
There are 4 main types you should know.
1️⃣ Data Drift (Feature Drift / Covariate Drift)
Meaning:
Input features change.
Training data distribution ≠ New data distribution.
๐ House Example
Training data:
-
Average price = £250k
-
Mostly semi-detached
-
Mostly CV area
New data:
-
Average price = £400k
-
More luxury flats
-
New postcode areas
Even if relationship stays same, inputs changed.
That’s Data Drift.
๐ Example
Training:
Now:
Big shift.
That’s covariate drift.
2️⃣ Concept Drift
Meaning:
Relationship between input and output changes.
Model assumption becomes outdated.
๐ House Example
Before COVID:
Location mattered most.
After COVID:
Remote work → size matters more than location.
Same features,
But price relationship changed.
That’s Concept Drift.
๐ฅ Important:
Concept drift is more dangerous than data drift.
Because:
Even if feature distribution same,
Prediction logic is wrong.
3️⃣ Label Drift (Prior Probability Shift)
Meaning:
Target distribution changes.
๐ House Example
Training:
Most houses between £200k–£300k.
Now:
Most houses between £400k–£600k.
Price distribution shifted.
That’s label drift.
4️⃣ Prediction Drift
Meaning:
Model predictions distribution changes.
Example:
Training predictions:
Mostly 200k–350k
Now predictions:
Mostly 500k–700k
Something changed.
Even if you don’t know why.
๐ง Simple Summary Table
| Drift Type | What Changed? | Example |
|---|---|---|
| Data Drift | Inputs | More flats now |
| Concept Drift | Relationship | Size matters more now |
| Label Drift | Target | Prices increased |
| Prediction Drift | Model outputs | Predictions unstable |
๐ฏ Real World Monitoring
In production, we compare:
Training distribution vs Live distribution
Using:
-
Mean comparison
-
KS test
-
PSI (Population Stability Index)
-
Statistical tests
๐ง Simple Analogy
Imagine you trained chef in India.
Now he is cooking in UK.
Ingredients different.
Spice levels different.
Customer taste different.
If you don’t adjust recipe → customers unhappy.
Drift monitoring = tasting food daily.
๐ Why Drift Monitoring is Included
Because:
Models degrade over time.
Without monitoring:
You don’t know model is failing.
Drift monitoring tells:
“Time to retrain.”
๐ Interview Answer
In production systems, data distributions can shift due to market changes, user behavior changes, or economic factors. We monitor for data drift, concept drift, and label drift by comparing training and live distributions. If drift exceeds a threshold, we trigger retraining to maintain model performance.
here’s drift detection in math + PSI + working code + RL drift, in a way you can teach.
1) How to detect drift mathematically
You always compare training distribution vs live distribution .
A) Numeric features (continuous)
1) KS test (Kolmogorov–Smirnov)
-
Measures the maximum distance between two CDFs.
-
Output: + p-value
-
If p-value is very small → distributions differ → drift.
2) Wasserstein distance (Earth Mover)
-
“How much mass to move to transform one distribution into the other.”
-
Higher = more drift (no p-value; you set threshold using history).
3) Mean/Std shift (simple)
-
Fast baseline check (not as reliable alone).
B) Categorical features
1) Chi-square test
Compare category frequencies:
-
p-value small → drift.
2) Jensen–Shannon divergence (nice for probabilities)
-
0 = same; higher = more drift.
2) PSI calculation (Population Stability Index)
PSI is the most used “business friendly” drift metric.
Step-by-step
-
Create bins using training (often quantiles).
-
For each bin :
-
= % of training samples in bin
-
= % of live samples in bin
-
-
Compute:
PSI interpretation (common rule of thumb)
-
< 0.10: no / minimal drift
-
0.10 – 0.25: moderate drift (watch)
-
> 0.25: significant drift (retrain / investigate)
(These thresholds are heuristics; tune them for your domain.)
3) Real drift detection code example
This snippet detects drift for:
-
numeric columns: KS test + Wasserstein + PSI
-
categorical columns: Chi-square + PSI
-
plus prediction drift
How to use it in your project
-
Save the report as
reports/drift_report.csv -
If any
ALERT→ trigger retraining pipeline
4) Drift in Reinforcement Learning (RL)
RL drift is trickier because the agent affects the data.
Key idea
In supervised ML, data comes from the world.
In RL, data comes from (world + policy behavior).
So you can have drift from:
A) Environment drift
The environment dynamics change:
Example:
-
In trading RL: market regime changes (bull → bear)
-
In robotics: floor friction changes
-
In games: opponent strategy changes
B) Reward drift
The reward function changes:
Example:
-
New transaction fees in trading
-
New business KPI definitions
C) Observation drift
Sensors / features shift:
Example:
-
New input signals, missing features, scaled differently
D) Policy-induced drift (most unique to RL)
As the agent improves, it visits new states:
-
Early training: explores randomly → sees broad states
-
Later: focuses on profitable states → state distribution shifts
So even without environment change, your “data distribution” changes.
Practical drift monitoring in RL
Monitor:
-
State distribution drift: PSI/JS on key state features
-
Action distribution drift: are chosen actions changing?
-
Reward distribution drift: mean/variance of episodic returns
-
Performance drift: rolling average return vs baseline
-
Regime detection (trading): volatility / correlation shifts
When to retrain in RL
-
Live return drops below threshold for N episodes
-
Action distribution collapses (too repetitive)
-
State distribution moves far from training replay buffer stats
Quick “teach others” summary
-
Drift = training data distribution ≠ live distribution
-
Detect drift using KS/Wasserstein (numeric), Chi-square/JS (categorical), and PSI (business-friendly)
-
In RL, drift can be environment, reward, observation, and policy-induced
Now we make this intuitive + unforgettable.
We will cover:
1️⃣ Drift analogy
2️⃣ What is CDF (very simple)
3️⃣ How to remember KS / PSI / RL drift
4️⃣ RL drift analogies
๐ 1️⃣ What is Drift? (Deep Analogy)
Drift = River Changed Direction
You trained your model when the river was flowing east.
Now the river flows west.
Your boat (model) still assumes east.
It crashes.
That change in river flow = drift.
๐ House Price Analogy
Training data (2021):
-
Average house price = £250k
-
Mostly semi-detached
-
Low interest rates
Live data (2026):
-
Average price = £450k
-
More flats
-
High interest rates
The “market climate” changed.
That’s drift.
๐ 2️⃣ What is CDF? (Super Simple)
CDF = Cumulative Distribution Function
Forget the scary name.
It means:
For a value X, how many data points are below X?
๐ง Analogy — Height Example
Suppose you measure heights of 100 people.
CDF at 170cm means:
What percentage of people are ≤ 170cm?
If 60 people are below 170cm:
CDF(170) = 0.60
๐ Visual Intuition
Imagine stacking sand from smallest to biggest values.
CDF shows how full the bucket is at any height.
๐ฏ Why KS Test Uses CDF
KS test compares:
Training CDF
vs
Live CDF
It checks:
What is the maximum gap between the two curves?
If gap is big → drift.
๐ง Simple Way to Remember KS
K = Kolmogorov
S = Super gap
KS = measure biggest gap between cumulative curves.
๐ 3️⃣ PSI Analogy (Business Friendly)
PSI = Population Stability Index
๐ง Analogy — Exam Grades
Training year:
-
30% A
-
40% B
-
30% C
This year:
-
10% A
-
20% B
-
70% C
Distribution changed.
PSI measures how much shift happened.
๐ง Easy Memory Rule
PSI = “Percentage Shift Indicator”
Big PSI → big shift.
๐ท️ Numeric vs Categorical Drift
| Type | Tool | Easy Analogy |
|---|---|---|
| Numeric | KS | Compare cumulative heights |
| Numeric | Wasserstein | How much sand to move |
| Categorical | Chi-square | Compare category counts |
| Business | PSI | Compare bucket percentages |
๐ค 4️⃣ Drift in Reinforcement Learning (Analogy)
RL is different.
Agent interacts with environment.
๐ฎ Analogy — Video Game Player
You trained agent to play:
Game version 1.
Suddenly:
-
New game update.
-
Enemies stronger.
-
Rewards reduced.
Now policy fails.
That’s Environment Drift.
๐ฐ Trading RL Example
Agent trained during bull market.
Market becomes bearish.
Patterns changed.
Reward distribution changed.
That’s drift.
๐ฏ Types of RL Drift (with simple analogies)
| Type | Analogy |
|---|---|
| Environment drift | Game update |
| Reward drift | Salary structure changed |
| Observation drift | Sensor broken |
| Policy-induced drift | Player changed strategy |
๐ง Policy-Induced Drift (Most Important)
This is unique to RL.
As agent learns:
It visits different states.
So state distribution changes.
Example:
Beginner driver:
Drives everywhere randomly.
Expert driver:
Avoids bad roads.
So environment data changes.
๐ง Memory Trick For Everything
Drift Types Memory:
D C L P
D → Data drift
C → Concept drift
L → Label drift
P → Policy drift
๐ง Simple Final Definitions
CDF:
Running percentage below a value.
KS:
Biggest gap between cumulative curves.
PSI:
Percentage distribution shift.
Drift:
Past world ≠ Present world.
๐ฅ One-Line Summary To Teach Others
Drift occurs when the statistical properties of live data differ from training data. We detect numeric drift using KS or Wasserstein distance by comparing CDFs, categorical drift using Chi-square or PSI, and in reinforcement learning, drift can occur in environment dynamics, rewards, observations, or due to evolving policies.