Guide 17: BFP health monitoring.

Adam BrownAuthor

Working notebookFormat

SP&L dataDataset

← Back to All Guides

Guide 17

Prefer not to install anything? Click the badge above to open this guide as a runnable notebook in Google Colab. Sign in with any Google account, then use Runtime → Run all to execute every cell, or step through them one at a time.

What You Will Learn

Boiler Feed Pumps (BFPs) are the heart of a thermal generating unit. They force feedwater into the boiler at extreme pressures, and their failure can trip an entire unit offline. In this guide you will:

Load hourly BFP time-series data from the SP&L generation dataset
Explore bearing temperature and vibration trends for BFP A and BFP B
Compute rolling mean and standard deviation to smooth noisy sensor data
Set threshold-based alerts for out-of-range conditions
Apply Isolation Forest to detect multivariate anomalies
Build a simple health dashboard combining all indicators

What is a Boiler Feed Pump? A BFP is a high-pressure centrifugal pump that delivers feedwater from the deaerator to the boiler drum. A typical 600 MW unit runs two BFPs in parallel, each driven by a steam turbine or electric motor. Bearing temperatures and vibration levels are the primary health indicators monitored by plant operators.

SP&L Data You Will Use

bfp_train_hourly.parquet — 8,784 rows x 88 columns of hourly BFP and system data
Column pattern: U1_BFPA_* (Pump A), U1_BFPB_* (Pump B), U1_* (system-level tags)
tag_dictionary.csv — maps tag names to engineering descriptions and units

Verify Your Setup

Before starting, verify that your environment is configured correctly. Run this cell first to confirm all dependencies are installed and data files are accessible.

# Step 0: Verify your setup
try:
    import pandas as pd
    import numpy as np
    from sklearn.ensemble import IsolationForest

    bfp = pd.read_parquet(
        "sisyphean-power-and-light/generation/timeseries/bfp_train_hourly.parquet"
    )
    print(f"Setup OK! Loaded {len(bfp):,} rows x {bfp.shape[1]} columns.")
except ModuleNotFoundError as e:
    print(f"Missing library: {e}")
    print("Run: pip install -r requirements.txt")
except FileNotFoundError:
    print("Data files not found. Run from the repo root:")
    print("  cd Dynamic-Network-Model && jupyter lab")
                    

Setup OK! Loaded 8,784 rows x 88 columns.

Working directory: All guides assume your working directory is the repository root (Dynamic-Network-Model/). Start Jupyter Lab from there: cd Dynamic-Network-Model && jupyter lab

Having trouble? Check our Troubleshooting Guide for solutions to common setup and data loading issues.

Load the Data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from sklearn.ensemble import IsolationForest

# Load BFP hourly time-series
bfp = pd.read_parquet(
    "sisyphean-power-and-light/generation/timeseries/bfp_train_hourly.parquet"
)

# Load tag dictionary for reference
tags = pd.read_csv(
    "sisyphean-power-and-light/generation/tag_dictionary.csv"
)

print(f"BFP data:  {bfp.shape[0]:,} rows x {bfp.shape[1]} columns")
print(f"Tag dict:  {len(tags)} entries")
print(f"Date range: {bfp.index.min()} to {bfp.index.max()}")
                    

BFP data: 8,784 rows x 88 columns Tag dict: 88 entries Date range: 2023-01-01 00:00:00 to 2023-12-31 23:00:00

Explore BFP Trends

Each BFP has bearing temperature and vibration sensors. Let's identify the relevant columns and plot the raw trends over the full year.

# Find BFP A and BFP B columns
bfpa_cols = [c for c in bfp.columns if "BFPA" in c]
bfpb_cols = [c for c in bfp.columns if "BFPB" in c]

print("BFP A tags:", bfpa_cols)
print("BFP B tags:", bfpb_cols)
                    

# Plot bearing temperatures for both pumps
fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

# BFP A bearing temps
for col in [c for c in bfpa_cols if "BEAR" in c and "TEMP" in c]:
    axes[0].plot(bfp.index, bfp[col], alpha=0.7, label=col)
axes[0].set_title("BFP A Bearing Temperatures")
axes[0].set_ylabel("Temperature (F)")
axes[0].legend(fontsize=8)

# BFP B bearing temps
for col in [c for c in bfpb_cols if "BEAR" in c and "TEMP" in c]:
    axes[1].plot(bfp.index, bfp[col], alpha=0.7, label=col)
axes[1].set_title("BFP B Bearing Temperatures")
axes[1].set_ylabel("Temperature (F)")
axes[1].legend(fontsize=8)

axes[1].xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
                    

# Plot vibration trends
fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

for col in [c for c in bfpa_cols if "VIB" in c]:
    axes[0].plot(bfp.index, bfp[col], alpha=0.7, label=col)
axes[0].set_title("BFP A Vibration")
axes[0].set_ylabel("Vibration (mils)")
axes[0].legend(fontsize=8)

for col in [c for c in bfpb_cols if "VIB" in c]:
    axes[1].plot(bfp.index, bfp[col], alpha=0.7, label=col)
axes[1].set_title("BFP B Vibration")
axes[1].set_ylabel("Vibration (mils)")
axes[1].legend(fontsize=8)

axes[1].xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
                    

Why two pumps? Most generating units run redundant BFPs (A and B) in parallel. If one trips, the other must carry the full feedwater load. Monitoring both pumps lets you compare their behavior: if Pump A's bearing temperature is climbing while Pump B stays flat, that is an early warning of a developing issue on Pump A.

Rolling Statistics

Raw sensor data is noisy. Rolling statistics smooth out short-term fluctuations and make trends visible. We will compute a 24-hour rolling mean and standard deviation for each bearing temperature sensor.

# Select one bearing temp column for demonstration
temp_col = [c for c in bfpa_cols if "BEAR" in c and "TEMP" in c][0]
print(f"Analyzing: {temp_col}")

# 24-hour rolling mean and std
bfp["rolling_mean_24h"] = bfp[temp_col].rolling(window=24).mean()
bfp["rolling_std_24h"]  = bfp[temp_col].rolling(window=24).std()

# Upper and lower bands (mean +/- 2 sigma)
bfp["upper_band"] = bfp["rolling_mean_24h"] + 2 * bfp["rolling_std_24h"]
bfp["lower_band"] = bfp["rolling_mean_24h"] - 2 * bfp["rolling_std_24h"]

print(bfp[[temp_col, "rolling_mean_24h", "rolling_std_24h"]].describe().round(2))
                    

# Visualize rolling statistics
fig, ax = plt.subplots(figsize=(14, 5))

ax.plot(bfp.index, bfp[temp_col], alpha=0.3, color="#718096", label="Raw")
ax.plot(bfp.index, bfp["rolling_mean_24h"], color="#5FCCDB", linewidth=2, label="24h Rolling Mean")
ax.fill_between(bfp.index, bfp["lower_band"], bfp["upper_band"],
                alpha=0.15, color="#5FCCDB", label="Mean +/- 2 sigma")

ax.set_title(f"{temp_col}: Raw vs. 24h Rolling Statistics")
ax.set_ylabel("Temperature (F)")
ax.legend()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
                    

Why 24-hour windows? BFP temperatures follow daily load cycles. A 24-hour window captures one complete cycle, so the rolling mean reflects the underlying trend while smoothing out normal load-driven fluctuations. The 2-sigma bands flag points that are statistically unusual even after accounting for daily variation.

Threshold Alerts

Plant operators use fixed alarm limits for bearing temperatures. Let's implement a simple threshold alert system and count how often each pump exceeds its limits.

# Define alarm thresholds (typical values for BFP bearings)
TEMP_HIGH_ALARM  = 180   # degrees F - high alarm
TEMP_HIGH_WARN   = 170   # degrees F - high warning

# Apply thresholds to all bearing temp columns
bear_temp_cols = [c for c in bfp.columns
                  if "BEAR" in c and "TEMP" in c
                  and "rolling" not in c]

alert_summary = []
for col in bear_temp_cols:
    n_warn  = (bfp[col] > TEMP_HIGH_WARN).sum()
    n_alarm = (bfp[col] > TEMP_HIGH_ALARM).sum()
    alert_summary.append({
        "tag": col,
        "warnings": n_warn,
        "alarms": n_alarm
    })

alerts_df = pd.DataFrame(alert_summary)
print(alerts_df.to_string(index=False))
                    

tag warnings alarms U1_BFPA_BEAR_DE_TEMP 142 37 U1_BFPA_BEAR_NDE_TEMP 98 18 U1_BFPB_BEAR_DE_TEMP 56 11 U1_BFPB_BEAR_NDE_TEMP 41 6

# Visualize alarm exceedances on a timeline
fig, ax = plt.subplots(figsize=(14, 5))

ax.plot(bfp.index, bfp[temp_col], alpha=0.6, color="#2D6A7A", label=temp_col)
ax.axhline(y=TEMP_HIGH_WARN, color="orange", linestyle="--", label=f"Warning ({TEMP_HIGH_WARN}F)")
ax.axhline(y=TEMP_HIGH_ALARM, color="red", linestyle="--", label=f"Alarm ({TEMP_HIGH_ALARM}F)")

# Highlight alarm points
alarm_mask = bfp[temp_col] > TEMP_HIGH_ALARM
ax.scatter(bfp.index[alarm_mask], bfp[temp_col][alarm_mask],
           c="red", s=20, zorder=5, label="Alarm Points")

ax.set_title(f"{temp_col}: Threshold Alerts")
ax.set_ylabel("Temperature (F)")
ax.legend()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
                    

Threshold limits vs. statistical anomalies: Fixed thresholds catch obvious problems (a bearing at 185F is always concerning), but they miss subtle degradation patterns like a slow upward trend that has not yet crossed the alarm limit. That is where Isolation Forest comes in.

Anomaly Detection with Isolation Forest

Isolation Forest detects anomalies by isolating observations in random decision trees. Points that are easy to isolate (require fewer splits) are anomalies. Unlike threshold alerts, it considers multiple sensors simultaneously.

# Select features for anomaly detection
# Use bearing temps + vibration for both pumps
feature_cols = [c for c in bfp.columns
                if ("BEAR" in c or "VIB" in c)
                and "rolling" not in c
                and "band" not in c]

print(f"Features for anomaly detection ({len(feature_cols)}):")
for col in feature_cols:
    print(f"  {col}")

# Drop rows with NaN (from parquet edges)
X = bfp[feature_cols].dropna()
print(f"\nSamples for training: {len(X):,}")
                    

# Fit Isolation Forest
iso_forest = IsolationForest(
    n_estimators=200,
    contamination=0.02,   # expect ~2% anomalies
    random_state=42
)
iso_forest.fit(X)

# Predict: -1 = anomaly, 1 = normal
bfp.loc[X.index, "iso_label"] = iso_forest.predict(X)
bfp.loc[X.index, "iso_score"] = iso_forest.decision_function(X)

n_anomalies = (bfp["iso_label"] == -1).sum()
print(f"Anomalies detected: {n_anomalies} ({n_anomalies/len(X)*100:.1f}%)")
                    

Anomalies detected: 176 (2.0%)

# Visualize anomalies on temperature trend
fig, ax = plt.subplots(figsize=(14, 5))

normal = bfp[bfp["iso_label"] == 1]
anomaly = bfp[bfp["iso_label"] == -1]

ax.plot(normal.index, normal[temp_col], color="#5FCCDB",
        alpha=0.5, linewidth=0.5, label="Normal")
ax.scatter(anomaly.index, anomaly[temp_col],
           c="red", s=15, zorder=5, label="Anomaly")

ax.set_title("Isolation Forest Anomalies on BFP A Bearing Temperature")
ax.set_ylabel("Temperature (F)")
ax.legend()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
                    

What is Isolation Forest? Unlike most anomaly detectors that learn what "normal" looks like, Isolation Forest explicitly isolates anomalies. It builds random trees by randomly selecting a feature and a random split value. Anomalous points (rare, different values) require fewer random splits to isolate than normal points. The anomaly score reflects how quickly each point is isolated across all trees.

Health Dashboard

Combine threshold alerts, rolling statistics, and Isolation Forest scores into a single health dashboard that gives plant operators a unified view of BFP condition.

# Build a daily health summary
daily = bfp.resample("D").agg({
    temp_col: ["mean", "max"],
    "rolling_std_24h": "mean",
    "iso_score": "mean",
    "iso_label": lambda x: (x == -1).sum()
})
daily.columns = ["temp_mean", "temp_max", "avg_volatility",
                  "avg_iso_score", "anomaly_count"]

print("Daily health summary (first 10 days):")
print(daily.head(10).round(2))
                    

# 4-panel health dashboard
fig, axes = plt.subplots(4, 1, figsize=(14, 14), sharex=True)

# Panel 1: Daily mean temperature
axes[0].plot(daily.index, daily["temp_mean"], color="#5FCCDB", linewidth=2)
axes[0].fill_between(daily.index, daily["temp_mean"], alpha=0.2, color="#5FCCDB")
axes[0].axhline(y=TEMP_HIGH_WARN, color="orange", linestyle="--", alpha=0.7)
axes[0].set_ylabel("Mean Temp (F)")
axes[0].set_title("BFP Health Dashboard", fontsize=14, fontweight="bold")

# Panel 2: Daily max temperature
axes[1].bar(daily.index, daily["temp_max"], color="#2D6A7A", alpha=0.7, width=1)
axes[1].axhline(y=TEMP_HIGH_ALARM, color="red", linestyle="--", alpha=0.7)
axes[1].set_ylabel("Max Temp (F)")

# Panel 3: Volatility (rolling std)
axes[2].plot(daily.index, daily["avg_volatility"], color="#D69E2E", linewidth=2)
axes[2].set_ylabel("Avg Volatility")

# Panel 4: Daily anomaly count
colors = ["red" if v > 0 else "#5FCCDB" for v in daily["anomaly_count"]]
axes[3].bar(daily.index, daily["anomaly_count"], color=colors, width=1)
axes[3].set_ylabel("Anomalies")
axes[3].set_xlabel("Date")

axes[3].xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
                    

✓

What You Built and Next Steps

Loaded hourly BFP time-series data with 88 sensor channels
Explored bearing temperature and vibration trends for both pumps
Computed 24-hour rolling mean and standard deviation to smooth noisy signals
Implemented threshold-based alerts using typical plant alarm limits
Trained an Isolation Forest model for multivariate anomaly detection
Built a 4-panel health dashboard combining all indicators

Ideas to Try Next

Add pump speed and discharge pressure: Include additional BFP columns to give the Isolation Forest more context about operating conditions
Cross-reference alarm logs: Load alarm_log.csv and overlay actual plant alarms on your dashboard
Trend rate-of-change: Compute the derivative of the rolling mean to detect accelerating degradation
Compare A vs. B: Build side-by-side comparison charts to spot diverging behavior between redundant pumps
Weekly health reports: Aggregate the daily dashboard into weekly summaries for maintenance planning

Ready to Level Up?

In the advanced guide, you'll classify specific fault types using multi-sensor fusion, Random Forest, and SHAP explainability.

Go to Advanced Rotating Equipment Fault Diagnosis →

← Prev: Advanced Anomaly Detection Next: Feedwater Load Correlation →

— Adam · adam@sgridworks.com