What You Will Learn
Boiler Feed Pumps (BFPs) are the heart of a thermal generating unit. They force feedwater into the boiler at extreme pressures, and their failure can trip an entire unit offline. In this guide you will:
- Load hourly BFP time-series data from the SP&L generation dataset
- Explore bearing temperature and vibration trends for BFP A and BFP B
- Compute rolling mean and standard deviation to smooth noisy sensor data
- Set threshold-based alerts for out-of-range conditions
- Apply Isolation Forest to detect multivariate anomalies
- Build a simple health dashboard combining all indicators
What is a Boiler Feed Pump? A BFP is a high-pressure centrifugal pump that delivers feedwater from the deaerator to the boiler drum. A typical 600 MW unit runs two BFPs in parallel, each driven by a steam turbine or electric motor. Bearing temperatures and vibration levels are the primary health indicators monitored by plant operators.
SP&L Data You Will Use
- bfp_train_hourly.parquet — 8,784 rows x 88 columns of hourly BFP and system data
- Column pattern:
U1_BFPA_* (Pump A), U1_BFPB_* (Pump B), U1_* (system-level tags)
- tag_dictionary.csv — maps tag names to engineering descriptions and units
Before starting, verify that your environment is configured correctly. Run this cell first to confirm all dependencies are installed and data files are accessible.
try:
import pandas as pd
import numpy as np
from sklearn.ensemble import IsolationForest
bfp = pd.read_parquet(
"sisyphean-power-and-light/generation/timeseries/bfp_train_hourly.parquet"
)
print(f"Setup OK! Loaded {len(bfp):,} rows x {bfp.shape[1]} columns.")
except ModuleNotFoundError as e:
print(f"Missing library: {e}")
print("Run: pip install -r requirements.txt")
except FileNotFoundError:
print("Data files not found. Run from the repo root:")
print(" cd Dynamic-Network-Model && jupyter lab")
Setup OK! Loaded 8,784 rows x 88 columns.
Working directory: All guides assume your working directory is the repository root (Dynamic-Network-Model/). Start Jupyter Lab from there: cd Dynamic-Network-Model && jupyter lab
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from sklearn.ensemble import IsolationForest
bfp = pd.read_parquet(
"sisyphean-power-and-light/generation/timeseries/bfp_train_hourly.parquet"
)
tags = pd.read_csv(
"sisyphean-power-and-light/generation/tag_dictionary.csv"
)
print(f"BFP data: {bfp.shape[0]:,} rows x {bfp.shape[1]} columns")
print(f"Tag dict: {len(tags)} entries")
print(f"Date range: {bfp.index.min()} to {bfp.index.max()}")
BFP data: 8,784 rows x 88 columns
Tag dict: 88 entries
Date range: 2023-01-01 00:00:00 to 2023-12-31 23:00:00
Each BFP has bearing temperature and vibration sensors. Let's identify the relevant columns and plot the raw trends over the full year.
bfpa_cols = [c for c in bfp.columns if "BFPA" in c]
bfpb_cols = [c for c in bfp.columns if "BFPB" in c]
print("BFP A tags:", bfpa_cols)
print("BFP B tags:", bfpb_cols)
fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True)
for col in [c for c in bfpa_cols if "BEAR" in c and "TEMP" in c]:
axes[0].plot(bfp.index, bfp[col], alpha=0.7, label=col)
axes[0].set_title("BFP A Bearing Temperatures")
axes[0].set_ylabel("Temperature (F)")
axes[0].legend(fontsize=8)
for col in [c for c in bfpb_cols if "BEAR" in c and "TEMP" in c]:
axes[1].plot(bfp.index, bfp[col], alpha=0.7, label=col)
axes[1].set_title("BFP B Bearing Temperatures")
axes[1].set_ylabel("Temperature (F)")
axes[1].legend(fontsize=8)
axes[1].xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True)
for col in [c for c in bfpa_cols if "VIB" in c]:
axes[0].plot(bfp.index, bfp[col], alpha=0.7, label=col)
axes[0].set_title("BFP A Vibration")
axes[0].set_ylabel("Vibration (mils)")
axes[0].legend(fontsize=8)
for col in [c for c in bfpb_cols if "VIB" in c]:
axes[1].plot(bfp.index, bfp[col], alpha=0.7, label=col)
axes[1].set_title("BFP B Vibration")
axes[1].set_ylabel("Vibration (mils)")
axes[1].legend(fontsize=8)
axes[1].xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
Why two pumps? Most generating units run redundant BFPs (A and B) in parallel. If one trips, the other must carry the full feedwater load. Monitoring both pumps lets you compare their behavior: if Pump A's bearing temperature is climbing while Pump B stays flat, that is an early warning of a developing issue on Pump A.
Raw sensor data is noisy. Rolling statistics smooth out short-term fluctuations and make trends visible. We will compute a 24-hour rolling mean and standard deviation for each bearing temperature sensor.
temp_col = [c for c in bfpa_cols if "BEAR" in c and "TEMP" in c][0]
print(f"Analyzing: {temp_col}")
bfp["rolling_mean_24h"] = bfp[temp_col].rolling(window=24).mean()
bfp["rolling_std_24h"] = bfp[temp_col].rolling(window=24).std()
bfp["upper_band"] = bfp["rolling_mean_24h"] + 2 * bfp["rolling_std_24h"]
bfp["lower_band"] = bfp["rolling_mean_24h"] - 2 * bfp["rolling_std_24h"]
print(bfp[[temp_col, "rolling_mean_24h", "rolling_std_24h"]].describe().round(2))
fig, ax = plt.subplots(figsize=(14, 5))
ax.plot(bfp.index, bfp[temp_col], alpha=0.3, color="#718096", label="Raw")
ax.plot(bfp.index, bfp["rolling_mean_24h"], color="#5FCCDB", linewidth=2, label="24h Rolling Mean")
ax.fill_between(bfp.index, bfp["lower_band"], bfp["upper_band"],
alpha=0.15, color="#5FCCDB", label="Mean +/- 2 sigma")
ax.set_title(f"{temp_col}: Raw vs. 24h Rolling Statistics")
ax.set_ylabel("Temperature (F)")
ax.legend()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
Why 24-hour windows? BFP temperatures follow daily load cycles. A 24-hour window captures one complete cycle, so the rolling mean reflects the underlying trend while smoothing out normal load-driven fluctuations. The 2-sigma bands flag points that are statistically unusual even after accounting for daily variation.
Plant operators use fixed alarm limits for bearing temperatures. Let's implement a simple threshold alert system and count how often each pump exceeds its limits.
TEMP_HIGH_ALARM = 180
TEMP_HIGH_WARN = 170
bear_temp_cols = [c for c in bfp.columns
if "BEAR" in c and "TEMP" in c
and "rolling" not in c]
alert_summary = []
for col in bear_temp_cols:
n_warn = (bfp[col] > TEMP_HIGH_WARN).sum()
n_alarm = (bfp[col] > TEMP_HIGH_ALARM).sum()
alert_summary.append({
"tag": col,
"warnings": n_warn,
"alarms": n_alarm
})
alerts_df = pd.DataFrame(alert_summary)
print(alerts_df.to_string(index=False))
tag warnings alarms
U1_BFPA_BEAR_DE_TEMP 142 37
U1_BFPA_BEAR_NDE_TEMP 98 18
U1_BFPB_BEAR_DE_TEMP 56 11
U1_BFPB_BEAR_NDE_TEMP 41 6
fig, ax = plt.subplots(figsize=(14, 5))
ax.plot(bfp.index, bfp[temp_col], alpha=0.6, color="#2D6A7A", label=temp_col)
ax.axhline(y=TEMP_HIGH_WARN, color="orange", linestyle="--", label=f"Warning ({TEMP_HIGH_WARN}F)")
ax.axhline(y=TEMP_HIGH_ALARM, color="red", linestyle="--", label=f"Alarm ({TEMP_HIGH_ALARM}F)")
alarm_mask = bfp[temp_col] > TEMP_HIGH_ALARM
ax.scatter(bfp.index[alarm_mask], bfp[temp_col][alarm_mask],
c="red", s=20, zorder=5, label="Alarm Points")
ax.set_title(f"{temp_col}: Threshold Alerts")
ax.set_ylabel("Temperature (F)")
ax.legend()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
Threshold limits vs. statistical anomalies: Fixed thresholds catch obvious problems (a bearing at 185F is always concerning), but they miss subtle degradation patterns like a slow upward trend that has not yet crossed the alarm limit. That is where Isolation Forest comes in.
Isolation Forest detects anomalies by isolating observations in random decision trees. Points that are easy to isolate (require fewer splits) are anomalies. Unlike threshold alerts, it considers multiple sensors simultaneously.
feature_cols = [c for c in bfp.columns
if ("BEAR" in c or "VIB" in c)
and "rolling" not in c
and "band" not in c]
print(f"Features for anomaly detection ({len(feature_cols)}):")
for col in feature_cols:
print(f" {col}")
X = bfp[feature_cols].dropna()
print(f"\nSamples for training: {len(X):,}")
iso_forest = IsolationForest(
n_estimators=200,
contamination=0.02,
random_state=42
)
iso_forest.fit(X)
bfp.loc[X.index, "iso_label"] = iso_forest.predict(X)
bfp.loc[X.index, "iso_score"] = iso_forest.decision_function(X)
n_anomalies = (bfp["iso_label"] == -1).sum()
print(f"Anomalies detected: {n_anomalies} ({n_anomalies/len(X)*100:.1f}%)")
Anomalies detected: 176 (2.0%)
fig, ax = plt.subplots(figsize=(14, 5))
normal = bfp[bfp["iso_label"] == 1]
anomaly = bfp[bfp["iso_label"] == -1]
ax.plot(normal.index, normal[temp_col], color="#5FCCDB",
alpha=0.5, linewidth=0.5, label="Normal")
ax.scatter(anomaly.index, anomaly[temp_col],
c="red", s=15, zorder=5, label="Anomaly")
ax.set_title("Isolation Forest Anomalies on BFP A Bearing Temperature")
ax.set_ylabel("Temperature (F)")
ax.legend()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
What is Isolation Forest? Unlike most anomaly detectors that learn what "normal" looks like, Isolation Forest explicitly isolates anomalies. It builds random trees by randomly selecting a feature and a random split value. Anomalous points (rare, different values) require fewer random splits to isolate than normal points. The anomaly score reflects how quickly each point is isolated across all trees.
Combine threshold alerts, rolling statistics, and Isolation Forest scores into a single health dashboard that gives plant operators a unified view of BFP condition.
daily = bfp.resample("D").agg({
temp_col: ["mean", "max"],
"rolling_std_24h": "mean",
"iso_score": "mean",
"iso_label": lambda x: (x == -1).sum()
})
daily.columns = ["temp_mean", "temp_max", "avg_volatility",
"avg_iso_score", "anomaly_count"]
print("Daily health summary (first 10 days):")
print(daily.head(10).round(2))
fig, axes = plt.subplots(4, 1, figsize=(14, 14), sharex=True)
axes[0].plot(daily.index, daily["temp_mean"], color="#5FCCDB", linewidth=2)
axes[0].fill_between(daily.index, daily["temp_mean"], alpha=0.2, color="#5FCCDB")
axes[0].axhline(y=TEMP_HIGH_WARN, color="orange", linestyle="--", alpha=0.7)
axes[0].set_ylabel("Mean Temp (F)")
axes[0].set_title("BFP Health Dashboard", fontsize=14, fontweight="bold")
axes[1].bar(daily.index, daily["temp_max"], color="#2D6A7A", alpha=0.7, width=1)
axes[1].axhline(y=TEMP_HIGH_ALARM, color="red", linestyle="--", alpha=0.7)
axes[1].set_ylabel("Max Temp (F)")
axes[2].plot(daily.index, daily["avg_volatility"], color="#D69E2E", linewidth=2)
axes[2].set_ylabel("Avg Volatility")
colors = ["red" if v > 0 else "#5FCCDB" for v in daily["anomaly_count"]]
axes[3].bar(daily.index, daily["anomaly_count"], color=colors, width=1)
axes[3].set_ylabel("Anomalies")
axes[3].set_xlabel("Date")
axes[3].xaxis.set_major_formatter(mdates.DateFormatter("%b"))
plt.tight_layout()
plt.show()
- Loaded hourly BFP time-series data with 88 sensor channels
- Explored bearing temperature and vibration trends for both pumps
- Computed 24-hour rolling mean and standard deviation to smooth noisy signals
- Implemented threshold-based alerts using typical plant alarm limits
- Trained an Isolation Forest model for multivariate anomaly detection
- Built a 4-panel health dashboard combining all indicators
Ideas to Try Next
- Add pump speed and discharge pressure: Include additional BFP columns to give the Isolation Forest more context about operating conditions
- Cross-reference alarm logs: Load
alarm_log.csv and overlay actual plant alarms on your dashboard
- Trend rate-of-change: Compute the derivative of the rolling mean to detect accelerating degradation
- Compare A vs. B: Build side-by-side comparison charts to spot diverging behavior between redundant pumps
- Weekly health reports: Aggregate the daily dashboard into weekly summaries for maintenance planning
Key Terms Glossary
- BFP (Boiler Feed Pump) — a high-pressure pump that delivers feedwater from the deaerator to the boiler drum
- Rolling mean — the average of a fixed-size sliding window across a time series; smooths short-term noise
- Rolling standard deviation — the variability within the same sliding window; measures signal volatility
- Isolation Forest — an unsupervised anomaly detection algorithm that isolates outliers using random partitioning trees
- Contamination — the expected fraction of anomalies in the dataset; controls the Isolation Forest threshold
- Bearing DE/NDE — Drive End and Non-Drive End bearings; the two main bearing positions on a rotating machine
- Threshold alert — a fixed-value alarm that fires when a sensor exceeds a predefined limit