← Back to All Guides
Guide 20

BFP Digital Twin & Performance Tracking

Prefer not to install anything? Click the badge above to open this guide as a runnable notebook in Google Colab. Sign in with any Google account, then use Runtime → Run all to execute every cell, or step through them one at a time.

What You Will Learn

A digital twin is a computational model that mirrors a physical asset's expected behavior. For a BFP, the OEM pump curve defines what head and flow the pump should produce at a given speed. By comparing actual performance against the pump curve in real time, you can quantify exactly how much efficiency the pump has lost and predict when maintenance is needed. In this guide you will:

  • Load OEM pump curve data and plot head-capacity and efficiency curves
  • Interpolate pump curves to compute expected performance at any operating point
  • Compute performance deviation (actual vs. expected) for each hour
  • Build a physics-informed regression model that respects pump affinity laws
  • Monitor residuals over time to track efficiency decay
  • Construct a composite Pump Health Index combining multiple indicators
  • Correlate health index trends with maintenance events from the operator action log

Prerequisites: This guide builds on concepts from Guide 17: BFP Health Monitoring and Guide 18: Feedwater Load Correlation. Familiarity with pump curves and regression analysis is assumed.

SP&L Data You Will Use

  • bfp_train_hourly.parquet — 8,784 rows x 88 columns of hourly BFP and system data
  • pump_curves.csv — OEM head-capacity and efficiency curves at design speed
  • design_parameters.json — BFP design-point specifications (rated speed, head, flow, efficiency)
  • operator_actions.csv — timestamped maintenance and operator intervention records
  • heat_balance.csv — thermodynamic design data for cross-referencing expected feedwater conditions
0

Verify Your Setup

Before starting, verify that your environment is configured correctly. Run this cell first to confirm all dependencies are installed and data files are accessible.

# Step 0: Verify your setup try: import pandas as pd import numpy as np import json from scipy.interpolate import interp1d from sklearn.linear_model import LinearRegression bfp = pd.read_parquet( "sisyphean-power-and-light/generation/timeseries/bfp_train_hourly.parquet" ) curves = pd.read_csv( "sisyphean-power-and-light/generation/reference/pump_curves.csv" ) print(f"Setup OK! BFP: {len(bfp):,} rows, Pump curves: {len(curves)} points.") except ModuleNotFoundError as e: print(f"Missing library: {e}") print("Run: pip install -r requirements.txt") except FileNotFoundError: print("Data files not found. Run from the repo root:") print(" cd Dynamic-Network-Model && jupyter lab")
Setup OK! BFP: 8,784 rows, Pump curves: N points.

Working directory: All guides assume your working directory is the repository root (Dynamic-Network-Model/). Start Jupyter Lab from there: cd Dynamic-Network-Model && jupyter lab

Having trouble? Check our Troubleshooting Guide for solutions to common setup and data loading issues.

1

Load Pump Curves

import pandas as pd import numpy as np import json import matplotlib.pyplot as plt import matplotlib.dates as mdates from scipy.interpolate import interp1d from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score, mean_absolute_error # Load all data sources bfp = pd.read_parquet( "sisyphean-power-and-light/generation/timeseries/bfp_train_hourly.parquet" ) curves = pd.read_csv( "sisyphean-power-and-light/generation/reference/pump_curves.csv" ) with open("sisyphean-power-and-light/generation/reference/design_parameters.json") as f: design = json.load(f) ops = pd.read_csv( "sisyphean-power-and-light/generation/events/operator_actions.csv", parse_dates=["timestamp"] ) print("Pump curve data:") print(curves.head(10)) print(f"\nDesign parameters:") for key, val in design.items(): print(f" {key}: {val}") print(f"\nOperator actions: {len(ops):,} records")
2

Plot OEM Curves

The OEM pump curve defines the theoretical relationship between flow (capacity) and head (pressure rise) at design speed. The efficiency curve shows what percentage of input power converts to useful hydraulic work.

# Plot OEM head-capacity and efficiency curves fig, ax1 = plt.subplots(figsize=(10, 6)) # Head vs Flow ax1.plot(curves["flow_gpm"], curves["head_ft"], color="#5FCCDB", linewidth=2.5, label="Head (ft)") ax1.set_xlabel("Flow (GPM)", fontsize=12) ax1.set_ylabel("Head (ft)", fontsize=12, color="#5FCCDB") ax1.tick_params(axis="y", labelcolor="#5FCCDB") # Efficiency on secondary axis ax2 = ax1.twinx() ax2.plot(curves["flow_gpm"], curves["efficiency_pct"], color="#D69E2E", linewidth=2.5, linestyle="--", label="Efficiency (%)") ax2.set_ylabel("Efficiency (%)", fontsize=12, color="#D69E2E") ax2.tick_params(axis="y", labelcolor="#D69E2E") # Mark design point (Best Efficiency Point) if "bep_flow_gpm" in design: ax1.axvline(x=design["bep_flow_gpm"], color="gray", linestyle=":", alpha=0.7, label="BEP") ax1.set_title("OEM Pump Curve: Head & Efficiency vs Flow", fontsize=13) fig.legend(loc="upper right", bbox_to_anchor=(0.88, 0.88)) ax1.grid(True, alpha=0.3) plt.tight_layout() plt.show()

What is a pump curve? Every centrifugal pump has a characteristic curve published by the OEM (Original Equipment Manufacturer). The head-capacity curve shows that as flow increases, head decreases. The Best Efficiency Point (BEP) is where the pump converts the maximum percentage of input power to hydraulic work. Operating far from BEP causes extra vibration, heat, and wear.

3

Compute Expected Performance

Using the pump affinity laws, we can scale the OEM curve to any speed and compute the expected head and efficiency at the pump's actual operating point each hour.

# Build interpolation functions from OEM curves head_curve = interp1d( curves["flow_gpm"], curves["head_ft"], kind="cubic", fill_value="extrapolate" ) eff_curve = interp1d( curves["flow_gpm"], curves["efficiency_pct"], kind="cubic", fill_value="extrapolate" ) # Get design speed from parameters design_speed = design.get("rated_speed_rpm", 5500) print(f"Design speed: {design_speed} RPM")
# Identify actual operating columns mw_col = [c for c in bfp.columns if "MW" in c or "LOAD" in c][0] bfpa_cols = [c for c in bfp.columns if "BFPA" in c] flow_col = [c for c in bfpa_cols if "FLOW" in c][0] speed_col = [c for c in bfpa_cols if "SPD" in c or "RPM" in c][0] head_col = [c for c in bfpa_cols if "HEAD" in c or "DISCH" in c and "PRESS" in c][0] # Filter to running conditions running = bfp[bfp[mw_col] > 100].copy() # Pump affinity laws: Q2/Q1 = N2/N1, H2/H1 = (N2/N1)^2 speed_ratio = running[speed_col] / design_speed # Scale actual flow back to design-speed equivalent running["flow_at_design_speed"] = running[flow_col] / speed_ratio # Expected head at design speed for this flow running["expected_head_design"] = running["flow_at_design_speed"].apply( lambda q: float(head_curve(np.clip(q, curves["flow_gpm"].min(), curves["flow_gpm"].max()))) ) # Scale expected head back to actual speed running["expected_head"] = running["expected_head_design"] * speed_ratio**2 # Expected efficiency at this operating point running["expected_eff"] = running["flow_at_design_speed"].apply( lambda q: float(eff_curve(np.clip(q, curves["flow_gpm"].min(), curves["flow_gpm"].max()))) ) print("Expected performance computed for all running hours.") print(running[[flow_col, head_col, "expected_head", "expected_eff"]].describe().round(2))

Pump affinity laws: These fundamental relationships govern centrifugal pump behavior. Flow scales linearly with speed (Q ~ N). Head scales with the square of speed (H ~ N^2). Power scales with the cube of speed (P ~ N^3). By scaling actual measurements back to design speed, we can compare any operating point against the single OEM curve.

4

Performance Deviation

# Compute deviation: actual vs expected running["head_deviation"] = running[head_col] - running["expected_head"] running["head_dev_pct"] = (running["head_deviation"] / running["expected_head"]) * 100 print("Head deviation statistics:") print(running[["head_deviation", "head_dev_pct"]].describe().round(2))
# Overlay actual vs expected on the pump curve fig, ax = plt.subplots(figsize=(10, 7)) # OEM curve ax.plot(curves["flow_gpm"], curves["head_ft"], color="#2D6A7A", linewidth=3, label="OEM Curve (design speed)", zorder=3) # Actual operating points (scaled to design speed) scatter = ax.scatter( running["flow_at_design_speed"], running[head_col] / speed_ratio**2, # scale actual head back to design speed c=running.index.to_julian_date(), cmap="viridis", s=3, alpha=0.4, label="Actual (scaled to design speed)" ) cbar = plt.colorbar(scatter, ax=ax, label="Time") cbar.set_ticks([]) cbar.set_label("Time (Jan -> Dec)") ax.set_xlabel("Flow (GPM) at Design Speed", fontsize=12) ax.set_ylabel("Head (ft) at Design Speed", fontsize=12) ax.set_title("Actual Operating Points vs OEM Pump Curve") ax.legend() ax.grid(True, alpha=0.3) plt.tight_layout() plt.show()
# Head deviation trend over time fig, ax = plt.subplots(figsize=(14, 5)) ax.plot(running.index, running["head_dev_pct"], alpha=0.3, color="#718096", linewidth=0.5) # 7-day rolling mean rolling_dev = running["head_dev_pct"].rolling(168).mean() ax.plot(running.index, rolling_dev, color="#D69E2E", linewidth=2, label="7-day rolling mean") ax.axhline(y=0, color="red", linestyle="--", alpha=0.7) ax.set_ylabel("Head Deviation (%)") ax.set_title("BFP A Head Deviation from OEM Curve Over Time") ax.legend() ax.xaxis.set_major_formatter(mdates.DateFormatter("%b")) plt.tight_layout() plt.show()

What deviation tells you: A healthy pump operates on or near the OEM curve. If actual points consistently fall below the curve (negative head deviation), the pump is producing less head than it should at that flow and speed. This typically indicates impeller wear, internal recirculation, or increased internal clearances. A 3-5% deviation is significant for a BFP.

5

Physics-Informed Model

Instead of a purely data-driven regression, we build a physics-informed model that uses the affinity-law-scaled features. The model learns the residual deviation from the OEM curve as a function of operating conditions and time.

# Physics-informed features running["speed_ratio"] = running[speed_col] / design_speed running["speed_ratio_sq"] = running["speed_ratio"]**2 running["flow_pct_bep"] = running[flow_col] / design.get("bep_flow_gpm", 1000) running["days_since_start"] = (running.index - running.index[0]).total_seconds() / 86400 # Feature matrix for the physics-informed model phys_features = ["speed_ratio", "speed_ratio_sq", "flow_pct_bep", "days_since_start"] X_phys = running[phys_features].dropna() y_phys = running.loc[X_phys.index, "head_dev_pct"] model_phys = LinearRegression() model_phys.fit(X_phys, y_phys) y_pred_phys = model_phys.predict(X_phys) print("Physics-informed model coefficients:") for feat, coef in zip(phys_features, model_phys.coef_): print(f" {feat:25s}: {coef:+.4f}") print(f" {'intercept':25s}: {model_phys.intercept_:+.4f}") print(f"\n R-squared: {r2_score(y_phys, y_pred_phys):.4f}")
# The days_since_start coefficient tells us the daily degradation rate deg_rate = model_phys.coef_[phys_features.index("days_since_start")] print(f"\nEstimated degradation rate: {deg_rate:.4f}% head loss per day") print(f" {deg_rate * 365:.2f}% head loss per year")

Why "physics-informed"? A pure ML model treats all features as black-box inputs. By structuring our features around the pump affinity laws (speed ratio, speed ratio squared, flow relative to BEP), we embed physical knowledge into the model. The regression then only needs to learn the deviation from known physics, which requires less data and produces more interpretable coefficients. The days_since_start coefficient directly quantifies the degradation rate.

6

Residual Monitoring

# Model residuals (what the physics model can't explain) running.loc[X_phys.index, "phys_residual"] = y_phys - y_pred_phys # These residuals represent non-degradation anomalies fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True) # Panel 1: Model fit vs actual axes[0].plot(running.index, running["head_dev_pct"], alpha=0.3, color="#718096", linewidth=0.5, label="Actual deviation") axes[0].plot(X_phys.index, y_pred_phys, color="#D69E2E", linewidth=1.5, label="Physics model fit") axes[0].set_ylabel("Head Deviation (%)") axes[0].set_title("Physics-Informed Model: Actual vs Predicted Degradation") axes[0].legend() # Panel 2: Residuals with control limits resid = running["phys_residual"].dropna() resid_std = resid.std() axes[1].plot(resid.index, resid, alpha=0.5, color="#5FCCDB", linewidth=0.5) axes[1].axhline(y=0, color="gray", linestyle="-") axes[1].axhline(y=3 * resid_std, color="red", linestyle="--", label=f"+3 sigma ({3*resid_std:.2f})") axes[1].axhline(y=-3 * resid_std, color="red", linestyle="--", label=f"-3 sigma ({-3*resid_std:.2f})") axes[1].set_ylabel("Model Residual") axes[1].set_title("Physics Model Residuals (Non-Degradation Anomalies)") axes[1].legend() axes[1].xaxis.set_major_formatter(mdates.DateFormatter("%b")) plt.tight_layout() plt.show()

Separating degradation from anomalies: The physics model captures the expected degradation trend. Its residuals represent everything else: control system changes, unusual operating conditions, measurement noise, and genuine acute faults. Points beyond 3 sigma in the residual plot warrant investigation because the physics model cannot explain them through normal degradation.

7

Pump Health Index

A single composite health index combines multiple indicators into one number that operators and planners can track. We will combine head deviation, efficiency deviation, bearing temperature trends, and vibration levels into a 0-100 score.

# Component scores (each normalized to 0-100, higher = healthier) # 1. Head deviation score: 0% deviation = 100, -5% = 0 running["head_score"] = np.clip( 100 + running["head_dev_pct"] * 20, # 5% loss = score of 0 0, 100 ) # 2. Efficiency score running["eff_score"] = np.clip(running["expected_eff"], 0, 100) # 3. Bearing temperature score (lower is healthier) bear_cols = [c for c in running.columns if "BFPA" in c and "BEAR" in c and "TEMP" in c] max_bear_temp = running[bear_cols].max(axis=1) running["temp_score"] = np.clip( 100 - (max_bear_temp - 140) * 2.5, # 140F=100, 180F=0 0, 100 ) # 4. Vibration score (lower is healthier) vib_cols = [c for c in running.columns if "BFPA" in c and "VIB" in c and "roll" not in c and "delta" not in c] max_vib = running[vib_cols].max(axis=1) running["vib_score"] = np.clip( 100 - max_vib * 10, # 0 mils=100, 10 mils=0 0, 100 ) # Composite Health Index (weighted average) running["health_index"] = ( 0.35 * running["head_score"] + 0.25 * running["eff_score"] + 0.20 * running["temp_score"] + 0.20 * running["vib_score"] ) print("Pump Health Index statistics:") print(running["health_index"].describe().round(1))
# Health Index dashboard fig, axes = plt.subplots(5, 1, figsize=(14, 16), sharex=True) components = [ ("head_score", "Head Score", "#5FCCDB"), ("eff_score", "Efficiency Score", "#2D6A7A"), ("temp_score", "Bearing Temp Score", "#D69E2E"), ("vib_score", "Vibration Score", "#718096"), ("health_index", "Composite Health Index", "#E53E3E"), ] for i, (col, title, color) in enumerate(components): daily = running[col].resample("D").mean() axes[i].fill_between(daily.index, daily, alpha=0.4, color=color) axes[i].plot(daily.index, daily, color=color, linewidth=1.5) axes[i].set_ylabel(title, fontsize=9) axes[i].set_ylim(0, 105) axes[i].axhline(y=70, color="orange", linestyle="--", alpha=0.5) axes[i].axhline(y=40, color="red", linestyle="--", alpha=0.5) axes[0].set_title("BFP A Digital Twin: Health Index Components", fontsize=14, fontweight="bold") axes[4].xaxis.set_major_formatter(mdates.DateFormatter("%b")) plt.tight_layout() plt.show()

Health Index interpretation: Above 70 (green zone) = healthy operation. Between 40 and 70 (yellow zone) = degraded, plan maintenance. Below 40 (red zone) = critical, immediate action required. The weights (35% head, 25% efficiency, 20% temperature, 20% vibration) reflect the relative importance of each indicator for BFP health. Adjust these weights based on your plant's experience.

8

Correlate with Maintenance Events

The final validation is correlating the health index trend with actual maintenance events. If the health index drops before a maintenance outage and recovers afterward, the digital twin is capturing real degradation.

# Filter operator actions to BFP-related maintenance bfp_maint = ops[ops["description"].str.contains("BFP|pump|bearing|vibr", case=False, na=False)] print(f"BFP maintenance events: {len(bfp_maint)}") if len(bfp_maint) > 0: print(bfp_maint[["timestamp", "description"]].to_string(index=False))
# Overlay maintenance events on health index fig, ax = plt.subplots(figsize=(14, 6)) daily_hi = running["health_index"].resample("D").mean() ax.fill_between(daily_hi.index, daily_hi, alpha=0.3, color="#5FCCDB") ax.plot(daily_hi.index, daily_hi, color="#5FCCDB", linewidth=2, label="Health Index (daily avg)") # Zone shading ax.axhspan(70, 105, alpha=0.05, color="green") ax.axhspan(40, 70, alpha=0.05, color="orange") ax.axhspan(0, 40, alpha=0.05, color="red") # Maintenance event markers for _, event in bfp_maint.iterrows(): ax.axvline(x=event["timestamp"], color="purple", linestyle="-", alpha=0.6, linewidth=1.5) ax.annotate(event["description"][:30], xy=(event["timestamp"], 95), fontsize=7, rotation=45, ha="left", color="purple") ax.set_ylabel("Health Index", fontsize=12) ax.set_ylim(0, 105) ax.set_title("BFP A Digital Twin: Health Index vs Maintenance Events", fontsize=13, fontweight="bold") ax.legend() ax.xaxis.set_major_formatter(mdates.DateFormatter("%b")) plt.tight_layout() plt.show()
# Compute health index change around maintenance events WINDOW_DAYS = 7 for _, event in bfp_maint.iterrows(): t = event["timestamp"] before = running[ (running.index >= t - pd.Timedelta(days=WINDOW_DAYS)) & (running.index < t) ]["health_index"].mean() after = running[ (running.index > t) & (running.index <= t + pd.Timedelta(days=WINDOW_DAYS)) ]["health_index"].mean() print(f"{t.strftime('%Y-%m-%d')}: {event['description'][:40]}") print(f" Before: {before:.1f} After: {after:.1f} Change: {after - before:+.1f}")

What You Built and Next Steps

  1. Loaded OEM pump curves and design parameters for the BFP
  2. Plotted head-capacity and efficiency curves with the Best Efficiency Point
  3. Used pump affinity laws to compute expected performance at actual operating speed
  4. Quantified head and efficiency deviation from the OEM curve over time
  5. Built a physics-informed regression model that estimates the daily degradation rate
  6. Monitored model residuals with 3-sigma control limits for acute anomalies
  7. Constructed a composite Pump Health Index from four component scores
  8. Correlated health index trends with actual maintenance events

Ideas to Try Next

  • BFP B digital twin: Repeat the entire pipeline for Pump B and compare degradation rates between pumps
  • Predictive maintenance scheduling: Extrapolate the degradation trend to estimate when the health index will cross the 40-point threshold
  • Nonlinear degradation model: Try polynomial or exponential degradation curves instead of linear
  • Heat balance integration: Use heat_balance.csv to compute feedwater enthalpy rise and compare actual vs. design heat absorption
  • Real-time dashboard: Build an interactive Plotly dashboard that updates as new data arrives
  • Multi-pump fleet management: Extend to all rotating equipment in the plant and rank by health index for prioritized maintenance

Key Terms Glossary

  • Digital twin — a computational model that mirrors a physical asset's expected behavior, updated with real-time data to track deviations
  • Pump curve — the OEM-published relationship between flow, head, and efficiency for a centrifugal pump at rated speed
  • Best Efficiency Point (BEP) — the flow rate where the pump achieves maximum hydraulic efficiency; the ideal operating point
  • Pump affinity laws — Q ~ N (flow proportional to speed), H ~ N^2 (head proportional to speed squared), P ~ N^3 (power proportional to speed cubed)
  • Physics-informed ML — embedding known physical equations or constraints into a machine learning model to improve accuracy and interpretability
  • Head deviation — the difference between actual pump head and the head predicted by the OEM curve at the same flow and speed
  • Health index — a composite score (0-100) combining multiple condition indicators into a single actionable metric
  • Degradation rate — the rate at which equipment performance declines over time, typically expressed as percent loss per day or year
  • Interpolation — estimating values between known data points; used here to read the OEM curve at any flow rate

Congratulations!

You have completed the full BFP analysis series. From basic health monitoring through load correlation, fault diagnosis, and digital twin modeling, you now have the tools to build comprehensive rotating equipment analytics for any generating unit.

Explore All ML Playground Guides →