Learn how Judea Pearl’s causal models help teams explain AI behavior, debug failures, and make clearer product decisions beyond correlations.

A team notices something “obvious” in their dashboard: users who receive more notifications come back more often. So they crank up notification volume. A week later, retention dips and churn complaints rise. What happened?
The original pattern was real—but misleading. The most engaged users naturally trigger more notifications (because they use the product more), and they also naturally return more. Notifications didn’t cause retention; engagement caused both. The team acted on correlation and accidentally created a worse experience.
Causal thinking is the habit of asking: what causes what, and how do we know? Instead of stopping at “these two things move together,” you try to separate:
It’s not about being skeptical of data—it’s about being specific about the question. “Do notifications correlate with retention?” is different from “Will sending more notifications increase retention?” The second question is causal.
This post focuses on three practical areas where pattern-spotting often fails:
This isn’t a math-heavy tour of causal inference. You won’t need to learn do-calculus notation to get value here. The goal is a set of mental models and a workflow your team can use to:
If you’ve ever shipped a change that “looked good in the data” but didn’t work in reality, causal thinking is the missing link.
Judea Pearl is a computer scientist and philosopher of science whose work reshaped how many teams think about data, AI, and decision-making. Before his causal revolution, much of “learning from data” in computing focused on statistical associations: find patterns, fit models, predict what happens next. That approach is powerful—but it often breaks down the moment you ask a product or engineering question that contains the word because.
Pearl’s core shift was to treat causality as a first-class concept, not a vague intuition layered on top of correlations. Instead of only asking, “When X is high, is Y also high?”, causal thinking asks, “If we change X, will Y change?” That difference sounds small, but it separates prediction from decision-making.
Association answers “what tends to co-occur.” Causation aims to answer “what would happen if we intervened.” This matters in computing because many real decisions are interventions: shipping a feature, changing rankings, adding a guardrail, altering a training set, or tweaking a policy.
Pearl made causality more practical by framing it as a modeling choice plus explicit assumptions. You don’t “discover” causality automatically from data in general; you propose a causal story (often based on domain knowledge) and then use data to test, estimate, and refine it.
These tools gave teams a shared language to move from pattern-spotting to answering causal questions with clarity and discipline.
Correlation means two things move together: when one goes up, the other tends to go up (or down). It’s extremely useful—especially in data-heavy teams—because it helps with prediction and detection.
If ice cream sales spike when temperature rises, a correlated signal (temperature) can improve forecasting. In product and AI work, correlations power ranking models (“show more of what similar users clicked”), anomaly spotting (“this metric usually tracks that one”), and quick diagnostics (“errors rise when latency rises”).
The trouble starts when we treat correlation as an answer to a different question: what happens if we change something on purpose? That’s causation.
A correlated relationship may be driven by a third factor that affects both variables. Changing X doesn’t necessarily change Y—because X might not be the reason Y moved in the first place.
Imagine you plot weekly marketing spend against weekly sales and see a strong positive correlation. It’s tempting to conclude “more spend causes more sales.”
But suppose both rise during holidays. The season (a confounder) drives higher demand and also triggers bigger budgets. If you increase spend in a non-holiday week, sales might not rise much—because the underlying demand isn’t there.
You’re in causal territory when you hear yourself asking:
When the verb is change, launch, remove, or reduce, correlation is a starting clue—not the decision rule.
A causal diagram—often drawn as a DAG (Directed Acyclic Graph)—is a simple way to make a team’s assumptions visible. Instead of arguing in vague terms (“it’s probably the model” or “maybe the UI”), you put the story on paper.
The goal isn’t perfect truth; it’s a shared draft of “how we think the system works” that everyone can critique.
Suppose you’re evaluating whether a new onboarding tutorial (T) increases activation (A).
A common analytics reflex is to “control for all available variables.” In DAG terms, that can mean accidentally adjusting for:
With a DAG, you adjust for variables for a reason—typically to block confounding paths—rather than because they exist.
Start with a whiteboard and three steps:
Even a rough DAG aligns product, data, and engineering around the same causal question before you run numbers.
A big shift in Judea Pearl’s causal thinking is separating observing something from changing it.
If you observe that users who enable notifications retain better, you’ve learned a pattern. But you still don’t know whether notifications cause retention, or whether engaged users are simply more likely to turn notifications on.
An intervention is different: it means you actively set a variable to a value and ask what happens next. In product terms, that’s not “users chose X,” it’s “we shipped X.”
Pearl often labels this difference as:
The “do” idea is basically a mental note that you’re breaking the usual reasons a variable takes a value. When you intervene, notifications aren’t ON because engaged users opted in; they’re ON because you forced the setting (or nudged it). That’s the point: interventions help isolate cause-and-effect.
Most real product work is intervention-shaped:
These actions aim to change outcomes, not merely describe them. Causal thinking keeps the question honest: “If we do this, what will it change?”
You can’t interpret an intervention (or even design a good experiment) without assumptions about what affects what—your causal diagram, even if it’s informal.
For example, if seasonality influences both marketing spend and sign-ups, then “doing” a spend change without accounting for seasonality can still mislead you. Interventions are powerful, but they only answer causal questions when the underlying causal story is at least approximately right.
A counterfactual is a specific kind of “what if?” question: for this exact case, what would have happened if we had taken a different action (or if one input had been different)? It’s not “What happens on average?”—it’s “Would this outcome have changed for this person, this ticket, this transaction?”
Counterfactuals show up whenever someone asks for a path to a different outcome:
These questions are naturally user-level. They’re also concrete enough to guide product changes, policies, and explanations.
Imagine a loan model that rejects an application. A correlation-based explanation might say, “Low savings correlates with rejection.” A counterfactual asks:
If the applicant’s savings were $3,000 higher (everything else the same), would the model approve them?
If the answer is “yes,” you’ve learned something actionable: a plausible change that flips the decision. If the answer is “no,” you’ve avoided giving misleading advice like “increase savings” when the real blocker is debt-to-income or unstable employment history.
Counterfactuals depend on a causal model—a story about how variables influence each other—not just a dataset. You must decide what can realistically change, what would change as a consequence, and what must stay fixed. Without that causal structure, counterfactuals can become impossible scenarios (“increase savings without changing income or spending”) and produce unhelpful or unfair recommendations.
When an ML model fails in production, the root cause is rarely “the algorithm got worse.” More often, something in the system changed: what data you collect, how labels are produced, or what users do. Causal thinking helps you stop guessing and start isolating which change caused the degradation.
A few repeat offenders show up across teams:
These can look “fine” in aggregate dashboards because correlation can stay high even when the reason the model is right has changed.
A simple causal diagram (DAG) turns debugging into a map. It forces you to ask: is this feature a cause of the label, a consequence of it, or a consequence of how we measure it?
For example, if Labeling policy → Feature engineering → Model inputs, you may have built a pipeline where the model predicts the policy rather than the underlying phenomenon. A DAG makes that pathway visible so you can block it (remove the feature, change instrumentation, or redefine the label).
Instead of only inspecting predictions, try controlled interventions:
Many “explainability” tools answer a narrow question: Why did the model output this score? They often do this by highlighting influential inputs (feature importance, saliency maps, SHAP values). That can be useful—but it’s not the same as explaining the system the model sits inside.
A prediction explanation is local and descriptive: “This loan was declined mainly because income was low and utilization was high.”
A system explanation is causal and operational: “If we increased verified income (or reduced utilization) in a way that reflects a real intervention, would the decision change—and would downstream outcomes improve?”
The first helps you interpret model behavior. The second helps you decide what to do.
Causal thinking ties explanations to interventions. Instead of asking which variables correlate with the score, you ask which variables are valid levers and what effects they produce when changed.
A causal model forces you to be explicit about:
This matters because an “important feature” might be a proxy—useful for prediction, dangerous for action.
Post‑hoc explanations can look persuasive while staying purely correlational. If “number of support tickets” strongly predicts churn, a feature-importance plot may tempt a team to “reduce tickets” by making support harder to reach. That intervention could increase churn, because tickets were a symptom of underlying product issues—not a cause.
Correlation-based explanations are also brittle during distribution shifts: once user behavior changes, the same highlighted features may no longer mean the same thing.
Causal explanations are especially valuable when decisions have consequences and accountability:
When you need to act, not just interpret, explanation needs a causal backbone.
A/B testing is causal inference in its simplest, most practical form. When you randomly assign users to variant A or B, you’re performing an intervention: you’re not just observing what people chose, you’re setting what they see. In Pearl’s terms, randomization makes “do(variant = B)” real—so differences in outcomes can credibly be attributed to the change, not to who happened to pick it.
Random assignment breaks many hidden links between user traits and exposure. Power users, new users, time of day, device type—these factors still exist, but they’re (on average) balanced across groups. That balance is what turns a metric gap into a causal claim.
Even great teams can’t always run clean randomized tests:
In these cases, you can still think causally—you just need to be explicit about assumptions and uncertainty.
Common options include difference-in-differences (compare changes over time between groups), regression discontinuity (use a cutoff rule like “only users above score X”), instrumental variables (a natural nudge that changes exposure without directly changing the outcome), and matching/weighting to make groups more comparable. Each method trades randomization for assumptions; a causal diagram can help you state those assumptions clearly.
Before shipping a test (or an observational study), write down: the primary metric, guardrails, target population, duration, and decision rule. Pre-registration won’t eliminate bias, but it reduces metric shopping and makes causal claims easier to trust—and easier to debate as a team.
Most product debates sound like: “Metric X moved after we shipped Y—so Y worked.” Causal thinking tightens that into a clearer question: “Did change Y cause metric X to move, and by how much?” That shift turns dashboards from proof into starting points.
Pricing change: instead of “Did revenue go up after the price increase?”, ask:
Onboarding tweak: instead of “New users complete onboarding more often now,” ask:
Recommendation ranking change: instead of “CTR improved,” ask:
Dashboards often mix “who got the change” with “who would have done well anyway.” A classic example: you ship a new onboarding flow, but it’s first shown to users on the newest app version. If newer versions are adopted by more engaged users, your chart may show a lift that’s partly (or mostly) version adoption, not onboarding.
Other frequent confounders in product analytics:
A useful PRD section is literally titled “Causal Questions,” and includes:
If you’re using a rapid build loop (especially with LLM-assisted development), this section becomes even more important: it prevents “we can ship it fast” from turning into “we shipped it without knowing what it caused.” Teams building in Koder.ai often bake these causal questions into planning mode up front, then implement feature-flagged variants quickly, with snapshots/rollback to keep experimentation safe when results (or side effects) surprise you.
PMs define the decision and success criteria. Data partners translate it into measurable causal estimates and sanity checks. Engineering ensures the change is controllable (feature flags, clean exposure logging). Support shares qualitative signals—pricing changes often “work” while silently increasing cancellations or ticket volume. When everyone agrees on the causal question, shipping becomes learning—not just shipping.
Causal thinking doesn’t need a PhD-level rollout. Treat it like a team habit: write down your causal story, pressure-test it, then let data (and experiments when possible) confirm or correct it.
To make progress, collect four inputs up front:
In practice, speed matters here: the faster you can turn a causal question into a controlled change, the less time you spend arguing about ambiguous patterns. That’s one reason teams adopt platforms like Koder.ai to go from “hypothesis + plan” to a working, instrumented implementation (web, backend, or mobile) in days instead of weeks—while still keeping rigor through staged rollouts, deployments, and rollback.
If you want a refresher on experiments, see /blog/ab-testing-basics. For common traps in product metrics that mimic “effects,” see /blog/metrics-that-mislead.
Causal thinking is a shift from “what tends to move together?” to “what would change if we acted?” That shift—popularized in computing and statistics by Judea Pearl—helps teams avoid confident-sounding stories that don’t survive real-world interventions.
Correlation is a clue, not an answer.
Causal diagrams (DAGs) make assumptions visible and discussable.
Interventions (“do”) are different from observations (“see”).
Counterfactuals help explain single cases: “what if this one thing were different?”
Good causal work documents uncertainty and alternative explanations.
Causality requires care: hidden confounders, measurement errors, and selection effects can flip conclusions. The antidote is transparency—write down assumptions, show what data you used, and note what would falsify your claim.
If you want to go deeper, browse related articles on /blog and compare causal approaches with other analytics and “explainability” methods to see where each one helps—and where it can mislead.
Correlation helps you predict or detect (e.g., “when X rises, Y often rises too”). Causation answers a decision question: “If we change X on purpose, will Y change?”
Use correlation for forecasting and monitoring; use causal thinking when you’re about to ship a change, set a policy, or allocate budget.
Because the correlation may be driven by confounding. In the notifications example, highly engaged users both trigger/receive more notifications and return more.
If you increase notifications for everyone, you’ve changed the experience (an intervention) without changing the underlying engagement—so retention may not improve and can even worsen.
A DAG (Directed Acyclic Graph) is a simple diagram where:
It’s useful because it makes assumptions explicit, helping teams agree on what to adjust for, what not to adjust for, and what experiment would actually answer the question.
A common mistake is “control for everything,” which can accidentally adjust for mediators or colliders and bias the result.
“See” is observing what naturally happened (users opted in, a score was high). “Do” is actively setting a variable (shipping a feature, forcing a default).
The key idea: an intervention breaks the usual reasons a variable takes a value, which is why it can reveal cause-and-effect more reliably than observation alone.
A counterfactual asks: for this specific case, what would have happened if we had done something else.
It’s useful for:
It requires a causal model so you don’t propose impossible changes.
Focus on what changed upstream and what the model might be exploiting:
A causal mindset pushes you to test targeted interventions (ablations, perturbations) instead of chasing coincident metric movements.
Not necessarily. Feature importance explains what influenced the prediction, not what you should change.
A highly “important” feature can be a proxy or symptom (e.g., support tickets predict churn). Intervening on the proxy (“reduce tickets by hiding support”) can backfire. Causal explanations tie importance to valid levers and expected outcomes under intervention.
Randomized A/B tests are best when feasible, but you may need alternatives when:
In those cases, consider quasi-experiments like difference-in-differences, regression discontinuity, instrumental variables, or matching/weighting—while being explicit about assumptions.
Add a short section that forces clarity before analysis:
This keeps the team aligned on a causal question rather than post-hoc dashboard storytelling.