What You Will Learn
A digital twin is a computational model that mirrors a physical asset's expected behavior. For a BFP, the OEM pump curve defines what head and flow the pump should produce at a given speed. By comparing actual performance against the pump curve in real time, you can quantify exactly how much efficiency the pump has lost and predict when maintenance is needed. In this guide you will:
- Load OEM pump curve data and plot head-capacity and efficiency curves
- Interpolate pump curves to compute expected performance at any operating point
- Compute performance deviation (actual vs. expected) for each hour
- Build a physics-informed regression model that respects pump affinity laws
- Monitor residuals over time to track efficiency decay
- Construct a composite Pump Health Index combining multiple indicators
- Correlate health index trends with maintenance events from the operator action log
SP&L Data You Will Use
- bfp_train_hourly.parquet — 8,784 rows x 88 columns of hourly BFP and system data
- pump_curves.csv — OEM head-capacity and efficiency curves at design speed
- design_parameters.json — BFP design-point specifications (rated speed, head, flow, efficiency)
- operator_actions.csv — timestamped maintenance and operator intervention records
- heat_balance.csv — thermodynamic design data for cross-referencing expected feedwater conditions
Before starting, verify that your environment is configured correctly. Run this cell first to confirm all dependencies are installed and data files are accessible.
try:
import pandas as pd
import numpy as np
import json
from scipy.interpolate import interp1d
from sklearn.linear_model import LinearRegression
bfp = pd.read_parquet(
"sisyphean-power-and-light/generation/timeseries/bfp_train_hourly.parquet"
)
curves = pd.read_csv(
"sisyphean-power-and-light/generation/reference/pump_curves.csv"
)
print(f"Setup OK! BFP: {len(bfp):,} rows, Pump curves: {len(curves)} points.")
except ModuleNotFoundError as e:
print(f"Missing library: {e}")
print("Run: pip install -r requirements.txt")
except FileNotFoundError:
print("Data files not found. Run from the repo root:")
print(" cd Dynamic-Network-Model && jupyter lab")
Setup OK! BFP: 8,784 rows, Pump curves: N points.
Working directory: All guides assume your working directory is the repository root (Dynamic-Network-Model/). Start Jupyter Lab from there: cd Dynamic-Network-Model && jupyter lab
import pandas as pd
import numpy as np
import json
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from scipy.interpolate import interp1d
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error
bfp = pd.read_parquet(
"sisyphean-power-and-light/generation/timeseries/bfp_train_hourly.parquet"
)
curves = pd.read_csv(
"sisyphean-power-and-light/generation/reference/pump_curves.csv"
)
with open("sisyphean-power-and-light/generation/reference/design_parameters.json") as f:
design = json.load(f)
ops = pd.read_csv(
"sisyphean-power-and-light/generation/events/operator_actions.csv",
parse_dates=["timestamp"]
)
print("Pump curve data:")
print(curves.head(10))
print(f"\nDesign parameters:")
for key, val in design.items():
print(f" {key}: {val}")
print(f"\nOperator actions: {len(ops):,} records")
The OEM pump curve defines the theoretical relationship between flow (capacity) and head (pressure rise) at design speed. The efficiency curve shows what percentage of input power converts to useful hydraulic work.
fig, ax1 = plt.subplots(figsize=(10, 6))
ax1.plot(curves["flow_gpm"], curves["head_ft"],
color="#5FCCDB", linewidth=2.5, label="Head (ft)")
ax1.set_xlabel("Flow (GPM)", fontsize=12)
ax1.set_ylabel("Head (ft)", fontsize=12, color="#5FCCDB")
ax1.tick_params(axis="y", labelcolor="#5FCCDB")
ax2 = ax1.twinx()
ax2.plot(curves["flow_gpm"], curves["efficiency_pct"],
color="#D69E2E", linewidth=2.5, linestyle="--", label="Efficiency (%)")
ax2.set_ylabel("Efficiency (%)", fontsize=12, color="#D69E2E")
ax2.tick_params(axis="y", labelcolor="#D69E2E")
if "bep_flow_gpm" in design:
ax1.axvline(x=design["bep_flow_gpm"], color="gray",
linestyle=":", alpha=0.7, label="BEP")
ax1.set_title("OEM Pump Curve: Head & Efficiency vs Flow", fontsize=13)
fig.legend(loc="upper right", bbox_to_anchor=(0.88, 0.88))
ax1.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
What is a pump curve? Every centrifugal pump has a characteristic curve published by the OEM (Original Equipment Manufacturer). The head-capacity curve shows that as flow increases, head decreases. The Best Efficiency Point (BEP) is where the pump converts the maximum percentage of input power to hydraulic work. Operating far from BEP causes extra vibration, heat, and wear.
Using the pump affinity laws, we can scale the OEM curve to any speed and compute the expected head and efficiency at the pump's actual operating point each hour.
head_curve = interp1d(
curves["flow_gpm"], curves["head_ft"],
kind="cubic", fill_value="extrapolate"
)
eff_curve = interp1d(
curves["flow_gpm"], curves["efficiency_pct"],
kind="cubic", fill_value="extrapolate"
)
design_speed = design.get("rated_speed_rpm", 5500)
print(f"Design speed: {design_speed} RPM")
mw_col = [c for c in bfp.columns if "MW" in c or "LOAD" in c][0]
bfpa_cols = [c for c in bfp.columns if "BFPA" in c]
flow_col = [c for c in bfpa_cols if "FLOW" in c][0]
speed_col = [c for c in bfpa_cols if "SPD" in c or "RPM" in c][0]
head_col = [c for c in bfpa_cols if "HEAD" in c or "DISCH" in c and "PRESS" in c][0]
running = bfp[bfp[mw_col] > 100].copy()
speed_ratio = running[speed_col] / design_speed
running["flow_at_design_speed"] = running[flow_col] / speed_ratio
running["expected_head_design"] = running["flow_at_design_speed"].apply(
lambda q: float(head_curve(np.clip(q, curves["flow_gpm"].min(),
curves["flow_gpm"].max())))
)
running["expected_head"] = running["expected_head_design"] * speed_ratio**2
running["expected_eff"] = running["flow_at_design_speed"].apply(
lambda q: float(eff_curve(np.clip(q, curves["flow_gpm"].min(),
curves["flow_gpm"].max())))
)
print("Expected performance computed for all running hours.")
print(running[[flow_col, head_col, "expected_head", "expected_eff"]].describe().round(2))
Pump affinity laws: These fundamental relationships govern centrifugal pump behavior. Flow scales linearly with speed (Q ~ N). Head scales with the square of speed (H ~ N^2). Power scales with the cube of speed (P ~ N^3). By scaling actual measurements back to design speed, we can compare any operating point against the single OEM curve.
running["head_deviation"] = running[head_col] - running["expected_head"]
running["head_dev_pct"] = (running["head_deviation"] / running["expected_head"]) * 100
print("Head deviation statistics:")
print(running[["head_deviation", "head_dev_pct"]].describe().round(2))
fig, ax = plt.subplots(figsize=(10, 7))
ax.plot(curves["flow_gpm"], curves["head_ft"],
color="#2D6A7A", linewidth=3, label="OEM Curve (design speed)", zorder=3)
scatter = ax.scatter(
running["flow_at_design_speed"],
running[head_col] / speed_ratio**2,
c=running.index.to_julian_date(),
cmap="viridis", s=3, alpha=0.4, label="Actual (scaled to design speed)"
)
cbar = plt.colorbar(scatter, ax=ax, label="Time")
cbar.set_ticks([])
cbar.set_label("Time (Jan -> Dec)")
ax.set_xlabel("Flow (GPM) at Design Speed", fontsize=12)
ax.set_ylabel("Head (ft) at Design Speed", fontsize=12)
ax.set_title("Actual Operating Points vs OEM Pump Curve")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
fig, ax = plt.subplots(figsize=(14, 5))
ax.plot(running.index, running["head_dev_pct"],
alpha=0.3, color="#718096", linewidth=0.5)
rolling_dev = running["head_dev_pct"].rolling(168).mean()
ax.plot(running.index, rolling_dev, color="#D69E2E",
linewidth=2, label="7-day rolling mean")
ax.axhline(y=0, color="red", linestyle="--", alpha=0.7)
ax.set_ylabel("Head Deviation (%)")
ax.set_title("BFP A Head Deviation from OEM Curve Over Time")
ax.legend()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
What deviation tells you: A healthy pump operates on or near the OEM curve. If actual points consistently fall below the curve (negative head deviation), the pump is producing less head than it should at that flow and speed. This typically indicates impeller wear, internal recirculation, or increased internal clearances. A 3-5% deviation is significant for a BFP.
Instead of a purely data-driven regression, we build a physics-informed model that uses the affinity-law-scaled features. The model learns the residual deviation from the OEM curve as a function of operating conditions and time.
running["speed_ratio"] = running[speed_col] / design_speed
running["speed_ratio_sq"] = running["speed_ratio"]**2
running["flow_pct_bep"] = running[flow_col] / design.get("bep_flow_gpm", 1000)
running["days_since_start"] = (running.index - running.index[0]).total_seconds() / 86400
phys_features = ["speed_ratio", "speed_ratio_sq", "flow_pct_bep",
"days_since_start"]
X_phys = running[phys_features].dropna()
y_phys = running.loc[X_phys.index, "head_dev_pct"]
model_phys = LinearRegression()
model_phys.fit(X_phys, y_phys)
y_pred_phys = model_phys.predict(X_phys)
print("Physics-informed model coefficients:")
for feat, coef in zip(phys_features, model_phys.coef_):
print(f" {feat:25s}: {coef:+.4f}")
print(f" {'intercept':25s}: {model_phys.intercept_:+.4f}")
print(f"\n R-squared: {r2_score(y_phys, y_pred_phys):.4f}")
deg_rate = model_phys.coef_[phys_features.index("days_since_start")]
print(f"\nEstimated degradation rate: {deg_rate:.4f}% head loss per day")
print(f" {deg_rate * 365:.2f}% head loss per year")
Why "physics-informed"? A pure ML model treats all features as black-box inputs. By structuring our features around the pump affinity laws (speed ratio, speed ratio squared, flow relative to BEP), we embed physical knowledge into the model. The regression then only needs to learn the deviation from known physics, which requires less data and produces more interpretable coefficients. The days_since_start coefficient directly quantifies the degradation rate.
running.loc[X_phys.index, "phys_residual"] = y_phys - y_pred_phys
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
axes[0].plot(running.index, running["head_dev_pct"],
alpha=0.3, color="#718096", linewidth=0.5, label="Actual deviation")
axes[0].plot(X_phys.index, y_pred_phys,
color="#D69E2E", linewidth=1.5, label="Physics model fit")
axes[0].set_ylabel("Head Deviation (%)")
axes[0].set_title("Physics-Informed Model: Actual vs Predicted Degradation")
axes[0].legend()
resid = running["phys_residual"].dropna()
resid_std = resid.std()
axes[1].plot(resid.index, resid, alpha=0.5, color="#5FCCDB", linewidth=0.5)
axes[1].axhline(y=0, color="gray", linestyle="-")
axes[1].axhline(y=3 * resid_std, color="red", linestyle="--",
label=f"+3 sigma ({3*resid_std:.2f})")
axes[1].axhline(y=-3 * resid_std, color="red", linestyle="--",
label=f"-3 sigma ({-3*resid_std:.2f})")
axes[1].set_ylabel("Model Residual")
axes[1].set_title("Physics Model Residuals (Non-Degradation Anomalies)")
axes[1].legend()
axes[1].xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
Separating degradation from anomalies: The physics model captures the expected degradation trend. Its residuals represent everything else: control system changes, unusual operating conditions, measurement noise, and genuine acute faults. Points beyond 3 sigma in the residual plot warrant investigation because the physics model cannot explain them through normal degradation.
A single composite health index combines multiple indicators into one number that operators and planners can track. We will combine head deviation, efficiency deviation, bearing temperature trends, and vibration levels into a 0-100 score.
running["head_score"] = np.clip(
100 + running["head_dev_pct"] * 20,
0, 100
)
running["eff_score"] = np.clip(running["expected_eff"], 0, 100)
bear_cols = [c for c in running.columns
if "BFPA" in c and "BEAR" in c and "TEMP" in c]
max_bear_temp = running[bear_cols].max(axis=1)
running["temp_score"] = np.clip(
100 - (max_bear_temp - 140) * 2.5,
0, 100
)
vib_cols = [c for c in running.columns
if "BFPA" in c and "VIB" in c
and "roll" not in c and "delta" not in c]
max_vib = running[vib_cols].max(axis=1)
running["vib_score"] = np.clip(
100 - max_vib * 10,
0, 100
)
running["health_index"] = (
0.35 * running["head_score"] +
0.25 * running["eff_score"] +
0.20 * running["temp_score"] +
0.20 * running["vib_score"]
)
print("Pump Health Index statistics:")
print(running["health_index"].describe().round(1))
fig, axes = plt.subplots(5, 1, figsize=(14, 16), sharex=True)
components = [
("head_score", "Head Score", "#5FCCDB"),
("eff_score", "Efficiency Score", "#2D6A7A"),
("temp_score", "Bearing Temp Score", "#D69E2E"),
("vib_score", "Vibration Score", "#718096"),
("health_index", "Composite Health Index", "#E53E3E"),
]
for i, (col, title, color) in enumerate(components):
daily = running[col].resample("D").mean()
axes[i].fill_between(daily.index, daily, alpha=0.4, color=color)
axes[i].plot(daily.index, daily, color=color, linewidth=1.5)
axes[i].set_ylabel(title, fontsize=9)
axes[i].set_ylim(0, 105)
axes[i].axhline(y=70, color="orange", linestyle="--", alpha=0.5)
axes[i].axhline(y=40, color="red", linestyle="--", alpha=0.5)
axes[0].set_title("BFP A Digital Twin: Health Index Components",
fontsize=14, fontweight="bold")
axes[4].xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
Health Index interpretation: Above 70 (green zone) = healthy operation. Between 40 and 70 (yellow zone) = degraded, plan maintenance. Below 40 (red zone) = critical, immediate action required. The weights (35% head, 25% efficiency, 20% temperature, 20% vibration) reflect the relative importance of each indicator for BFP health. Adjust these weights based on your plant's experience.
The final validation is correlating the health index trend with actual maintenance events. If the health index drops before a maintenance outage and recovers afterward, the digital twin is capturing real degradation.
bfp_maint = ops[ops["description"].str.contains("BFP|pump|bearing|vibr",
case=False, na=False)]
print(f"BFP maintenance events: {len(bfp_maint)}")
if len(bfp_maint) > 0:
print(bfp_maint[["timestamp", "description"]].to_string(index=False))
fig, ax = plt.subplots(figsize=(14, 6))
daily_hi = running["health_index"].resample("D").mean()
ax.fill_between(daily_hi.index, daily_hi, alpha=0.3, color="#5FCCDB")
ax.plot(daily_hi.index, daily_hi, color="#5FCCDB", linewidth=2,
label="Health Index (daily avg)")
ax.axhspan(70, 105, alpha=0.05, color="green")
ax.axhspan(40, 70, alpha=0.05, color="orange")
ax.axhspan(0, 40, alpha=0.05, color="red")
for _, event in bfp_maint.iterrows():
ax.axvline(x=event["timestamp"], color="purple",
linestyle="-", alpha=0.6, linewidth=1.5)
ax.annotate(event["description"][:30],
xy=(event["timestamp"], 95),
fontsize=7, rotation=45, ha="left", color="purple")
ax.set_ylabel("Health Index", fontsize=12)
ax.set_ylim(0, 105)
ax.set_title("BFP A Digital Twin: Health Index vs Maintenance Events",
fontsize=13, fontweight="bold")
ax.legend()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
WINDOW_DAYS = 7
for _, event in bfp_maint.iterrows():
t = event["timestamp"]
before = running[
(running.index >= t - pd.Timedelta(days=WINDOW_DAYS)) &
(running.index < t)
]["health_index"].mean()
after = running[
(running.index > t) &
(running.index <= t + pd.Timedelta(days=WINDOW_DAYS))
]["health_index"].mean()
print(f"{t.strftime('%Y-%m-%d')}: {event['description'][:40]}")
print(f" Before: {before:.1f} After: {after:.1f} Change: {after - before:+.1f}")
- Loaded OEM pump curves and design parameters for the BFP
- Plotted head-capacity and efficiency curves with the Best Efficiency Point
- Used pump affinity laws to compute expected performance at actual operating speed
- Quantified head and efficiency deviation from the OEM curve over time
- Built a physics-informed regression model that estimates the daily degradation rate
- Monitored model residuals with 3-sigma control limits for acute anomalies
- Constructed a composite Pump Health Index from four component scores
- Correlated health index trends with actual maintenance events
Ideas to Try Next
- BFP B digital twin: Repeat the entire pipeline for Pump B and compare degradation rates between pumps
- Predictive maintenance scheduling: Extrapolate the degradation trend to estimate when the health index will cross the 40-point threshold
- Nonlinear degradation model: Try polynomial or exponential degradation curves instead of linear
- Heat balance integration: Use
heat_balance.csv to compute feedwater enthalpy rise and compare actual vs. design heat absorption
- Real-time dashboard: Build an interactive Plotly dashboard that updates as new data arrives
- Multi-pump fleet management: Extend to all rotating equipment in the plant and rank by health index for prioritized maintenance
Key Terms Glossary
- Digital twin — a computational model that mirrors a physical asset's expected behavior, updated with real-time data to track deviations
- Pump curve — the OEM-published relationship between flow, head, and efficiency for a centrifugal pump at rated speed
- Best Efficiency Point (BEP) — the flow rate where the pump achieves maximum hydraulic efficiency; the ideal operating point
- Pump affinity laws — Q ~ N (flow proportional to speed), H ~ N^2 (head proportional to speed squared), P ~ N^3 (power proportional to speed cubed)
- Physics-informed ML — embedding known physical equations or constraints into a machine learning model to improve accuracy and interpretability
- Head deviation — the difference between actual pump head and the head predicted by the OEM curve at the same flow and speed
- Health index — a composite score (0-100) combining multiple condition indicators into a single actionable metric
- Degradation rate — the rate at which equipment performance declines over time, typically expressed as percent loss per day or year
- Interpolation — estimating values between known data points; used here to read the OEM curve at any flow rate
Congratulations!
You have completed the full BFP analysis series. From basic health monitoring through load correlation, fault diagnosis, and digital twin modeling, you now have the tools to build comprehensive rotating equipment analytics for any generating unit.
Explore All ML Playground Guides →