Definition (Backwash) informal [Oxford English Dictionary]: The liquid that flows back into a bottle, glass, etc. after someone has taken a drink, assumed to contain that person’s saliva.
Suppose that Adam, our perennial gentleman-hermit, sets out to predict tomorrow’s weather. Adam is rather clever, so intuitively understands that weather patterns are not random and do in fact exhibit some degree of predictability. If the atmosphere were a purely natural system, he could model it as being generated by some underlying stochastic process:
where represents the weather state at time ; the natural dynamics is described by the function ; and is i.i.d. noise with .
That’s all to say: tomorrow’s weather is a deterministic function of today’s weather, plus some unpredictable variation. Given enough observations sampled from the process , Adam could consistently estimate or learn assuming standard regularity conditions.
Suppose Adam now meets a lovely lady, and has many kids, and has many grandkids, and has many great-grandkids, and so on. Modern civilization thus comes to be, and people burn fuel, cut forests, release aerosols, etc. Their collective actions now feed back into the weather process. Adam Jr. (“Adam’s” great-great-…-grandson) is a climate scientist frantically doing as his ancestor had. He predicts sunny skies. A hurricane hits. The insurance company is not amused.
Looking at the meteorological data, he accepts that he no longer observes the “pure” dynamics of his ancestor; instead, he observes the intervened process:
where captures the joint dynamics of natural and anthropogenic influences.
The fundamental problem? Adam Jr. wants to recover , but only ever observes realizations from . Moreover, the actions are not exogenous; they are typically functionally dependent on the very states he’s trying to predict. Let , where is a policy function and denotes information available to decision-makers. This endogeneity makes identifying from intervened observations a fundamentally challenging causal inference problem1. Adam Jr. must attempt to learn in a backwashed setting.
But this is not just a weather story.
Machines and Maintenance
Reliability engineers face a similar challenge. In theory, each machine within an industrial operation has an intrinsic degradation trajectory; namely, how its health would evolve absent human intervention. Let denote machine health at time , with higher values indicating better condition. The natural degradation process follows:
where is typically monotonically decreasing (health deteriorates over time) and captures stochastic wear.
In practice, machines are never left alone–bearings are replaced, oil is changed, loads are adjusted, lattice analysis is applied, etc. Let 2 represent the maintenance action taken at time . What the engineer observes is not pure degradation but degradation modulated by interventions:
The critical complication is that maintenance decisions are endogenous and state-dependent: . Interventions are triggered by the very deterioration they aim to prevent, creating a feedback loop that confounds the identification of natural degradation patterns.
The engineer needs for predictive maintenance scheduling and lifetime estimation, but only observes . Simply ignoring intervention periods discards valuable data. Treating interventions as exogenous noise misspecifies the data-generating process. The system’s response to interventions contains signal about both and the intervention mechanism , but disentangling them requires careful causal reasoning.
Power Systems and Demand Response
Electric grid planners face the same identification challenge. Their objective is to optimize infrastructure investments (substations, feeders, transformers) to maintain reliability while minimizing costs3. This requires understanding the natural demand profile of their customer base.
However, what planners observe is not raw demand but demand after various interventions:
- Demand response events that curtail load during peak periods
- Time-of-use pricing that shifts consumption patterns
- Direct load control of major appliances
Let represent the vector of demand-side interventions. The observed load follows:
where is observed load and is the underlying “natural” demand that would occur absent interventions.
Again, interventions are endogenous: . Demand response is activated precisely when natural demand would stress the system, creating systematic bias in the observed data.
A power system engineer colleague of mine told me that some planners address this by excluding intervention days from their training data. But this becomes infeasible as interventions become routine. More importantly, customer responses to interventions (load elasticity, participation rates) provide valuable information about underlying demand patterns and system flexibility—information that gets discarded with naive data filtering.
The planner’s challenge mirrors the previous examples: recover the natural process (baseline demand) while only observing the intervened process .
The General Framework
A system exhibits backwash when it satisfies the following conditions:
Definition (Backwash System): A discrete-time stochastic process exhibits backwash if:
-
Natural dynamics exist: There exists a function such that absent interventions, the system evolves as .
-
Interventions are endogenous: Actions follow , where is a potentially unknown policy function and represents information available to decision-makers.
-
Observed dynamics are intervened: What we observe follows , where .
-
Identification is hard: Recovering from observations of is non-trivial due to the confounding between natural dynamics and intervention effects.
The mathematical challenge is to estimate while only observing under endogenous actions. Backwash is not an edge case; it’s the default in modern engineered systems.
If you suspect backwash, do three things:
- Treat interventions as first-class data: log , policy versions, triggers, and context.
- Model both the mechanism and the policy: estimate and , then recover via counterfactuals. For the grid planner, is how load shifts under time-of-use prices and demand response events; is the rule that fires those events on peak forecasts.
- Seek exogenous variation: randomized trials, natural experiments, instrumental variables—or design your own shocks.
Backwash, in the dictionary sense, is what flows back into the glass after a sip. Your data are like that glass: they carry a trace of what you just did to the system. To taste the drink itself—baseline weather, machine health, or demand—you have to account for what flowed back.