← Back to All Guides
Guide 17

BFP Health Monitoring with Trend Analysis

Prefer not to install anything? Click the badge above to open this guide as a runnable notebook in Google Colab. Sign in with any Google account, then use Runtime → Run all to execute every cell, or step through them one at a time.

What You Will Learn

Boiler Feed Pumps (BFPs) are the heart of a thermal generating unit. They force feedwater into the boiler at extreme pressures, and their failure can trip an entire unit offline. In this guide you will:

  • Load hourly BFP time-series data from the SP&L generation dataset
  • Explore bearing temperature and vibration trends for BFP A and BFP B
  • Compute rolling mean and standard deviation to smooth noisy sensor data
  • Set threshold-based alerts for out-of-range conditions
  • Apply Isolation Forest to detect multivariate anomalies
  • Build a simple health dashboard combining all indicators

What is a Boiler Feed Pump? A BFP is a high-pressure centrifugal pump that delivers feedwater from the deaerator to the boiler drum. A typical 600 MW unit runs two BFPs in parallel, each driven by a steam turbine or electric motor. Bearing temperatures and vibration levels are the primary health indicators monitored by plant operators.

SP&L Data You Will Use

  • bfp_train_hourly.parquet — 8,784 rows x 88 columns of hourly BFP and system data
  • Column pattern: U1_BFPA_* (Pump A), U1_BFPB_* (Pump B), U1_* (system-level tags)
  • tag_dictionary.csv — maps tag names to engineering descriptions and units
0

Verify Your Setup

Before starting, verify that your environment is configured correctly. Run this cell first to confirm all dependencies are installed and data files are accessible.

# Step 0: Verify your setup try: import pandas as pd import numpy as np from sklearn.ensemble import IsolationForest bfp = pd.read_parquet( "sisyphean-power-and-light/generation/timeseries/bfp_train_hourly.parquet" ) print(f"Setup OK! Loaded {len(bfp):,} rows x {bfp.shape[1]} columns.") except ModuleNotFoundError as e: print(f"Missing library: {e}") print("Run: pip install -r requirements.txt") except FileNotFoundError: print("Data files not found. Run from the repo root:") print(" cd Dynamic-Network-Model && jupyter lab")
Setup OK! Loaded 8,784 rows x 88 columns.

Working directory: All guides assume your working directory is the repository root (Dynamic-Network-Model/). Start Jupyter Lab from there: cd Dynamic-Network-Model && jupyter lab

Having trouble? Check our Troubleshooting Guide for solutions to common setup and data loading issues.

1

Load the Data

import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib.dates as mdates from sklearn.ensemble import IsolationForest # Load BFP hourly time-series bfp = pd.read_parquet( "sisyphean-power-and-light/generation/timeseries/bfp_train_hourly.parquet" ) # Load tag dictionary for reference tags = pd.read_csv( "sisyphean-power-and-light/generation/tag_dictionary.csv" ) print(f"BFP data: {bfp.shape[0]:,} rows x {bfp.shape[1]} columns") print(f"Tag dict: {len(tags)} entries") print(f"Date range: {bfp.index.min()} to {bfp.index.max()}")
BFP data: 8,784 rows x 88 columns Tag dict: 88 entries Date range: 2023-01-01 00:00:00 to 2023-12-31 23:00:00
2

Explore BFP Trends

Each BFP has bearing temperature and vibration sensors. Let's identify the relevant columns and plot the raw trends over the full year.

# Find BFP A and BFP B columns bfpa_cols = [c for c in bfp.columns if "BFPA" in c] bfpb_cols = [c for c in bfp.columns if "BFPB" in c] print("BFP A tags:", bfpa_cols) print("BFP B tags:", bfpb_cols)
# Plot bearing temperatures for both pumps fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True) # BFP A bearing temps for col in [c for c in bfpa_cols if "BEAR" in c and "TEMP" in c]: axes[0].plot(bfp.index, bfp[col], alpha=0.7, label=col) axes[0].set_title("BFP A Bearing Temperatures") axes[0].set_ylabel("Temperature (F)") axes[0].legend(fontsize=8) # BFP B bearing temps for col in [c for c in bfpb_cols if "BEAR" in c and "TEMP" in c]: axes[1].plot(bfp.index, bfp[col], alpha=0.7, label=col) axes[1].set_title("BFP B Bearing Temperatures") axes[1].set_ylabel("Temperature (F)") axes[1].legend(fontsize=8) axes[1].xaxis.set_major_formatter(mdates.DateFormatter("%b")) plt.tight_layout() plt.show()
# Plot vibration trends fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True) for col in [c for c in bfpa_cols if "VIB" in c]: axes[0].plot(bfp.index, bfp[col], alpha=0.7, label=col) axes[0].set_title("BFP A Vibration") axes[0].set_ylabel("Vibration (mils)") axes[0].legend(fontsize=8) for col in [c for c in bfpb_cols if "VIB" in c]: axes[1].plot(bfp.index, bfp[col], alpha=0.7, label=col) axes[1].set_title("BFP B Vibration") axes[1].set_ylabel("Vibration (mils)") axes[1].legend(fontsize=8) axes[1].xaxis.set_major_formatter(mdates.DateFormatter("%b")) plt.tight_layout() plt.show()

Why two pumps? Most generating units run redundant BFPs (A and B) in parallel. If one trips, the other must carry the full feedwater load. Monitoring both pumps lets you compare their behavior: if Pump A's bearing temperature is climbing while Pump B stays flat, that is an early warning of a developing issue on Pump A.

3

Rolling Statistics

Raw sensor data is noisy. Rolling statistics smooth out short-term fluctuations and make trends visible. We will compute a 24-hour rolling mean and standard deviation for each bearing temperature sensor.

# Select one bearing temp column for demonstration temp_col = [c for c in bfpa_cols if "BEAR" in c and "TEMP" in c][0] print(f"Analyzing: {temp_col}") # 24-hour rolling mean and std bfp["rolling_mean_24h"] = bfp[temp_col].rolling(window=24).mean() bfp["rolling_std_24h"] = bfp[temp_col].rolling(window=24).std() # Upper and lower bands (mean +/- 2 sigma) bfp["upper_band"] = bfp["rolling_mean_24h"] + 2 * bfp["rolling_std_24h"] bfp["lower_band"] = bfp["rolling_mean_24h"] - 2 * bfp["rolling_std_24h"] print(bfp[[temp_col, "rolling_mean_24h", "rolling_std_24h"]].describe().round(2))
# Visualize rolling statistics fig, ax = plt.subplots(figsize=(14, 5)) ax.plot(bfp.index, bfp[temp_col], alpha=0.3, color="#718096", label="Raw") ax.plot(bfp.index, bfp["rolling_mean_24h"], color="#5FCCDB", linewidth=2, label="24h Rolling Mean") ax.fill_between(bfp.index, bfp["lower_band"], bfp["upper_band"], alpha=0.15, color="#5FCCDB", label="Mean +/- 2 sigma") ax.set_title(f"{temp_col}: Raw vs. 24h Rolling Statistics") ax.set_ylabel("Temperature (F)") ax.legend() ax.xaxis.set_major_formatter(mdates.DateFormatter("%b")) plt.tight_layout() plt.show()

Why 24-hour windows? BFP temperatures follow daily load cycles. A 24-hour window captures one complete cycle, so the rolling mean reflects the underlying trend while smoothing out normal load-driven fluctuations. The 2-sigma bands flag points that are statistically unusual even after accounting for daily variation.

4

Threshold Alerts

Plant operators use fixed alarm limits for bearing temperatures. Let's implement a simple threshold alert system and count how often each pump exceeds its limits.

# Define alarm thresholds (typical values for BFP bearings) TEMP_HIGH_ALARM = 180 # degrees F - high alarm TEMP_HIGH_WARN = 170 # degrees F - high warning # Apply thresholds to all bearing temp columns bear_temp_cols = [c for c in bfp.columns if "BEAR" in c and "TEMP" in c and "rolling" not in c] alert_summary = [] for col in bear_temp_cols: n_warn = (bfp[col] > TEMP_HIGH_WARN).sum() n_alarm = (bfp[col] > TEMP_HIGH_ALARM).sum() alert_summary.append({ "tag": col, "warnings": n_warn, "alarms": n_alarm }) alerts_df = pd.DataFrame(alert_summary) print(alerts_df.to_string(index=False))
tag warnings alarms U1_BFPA_BEAR_DE_TEMP 142 37 U1_BFPA_BEAR_NDE_TEMP 98 18 U1_BFPB_BEAR_DE_TEMP 56 11 U1_BFPB_BEAR_NDE_TEMP 41 6
# Visualize alarm exceedances on a timeline fig, ax = plt.subplots(figsize=(14, 5)) ax.plot(bfp.index, bfp[temp_col], alpha=0.6, color="#2D6A7A", label=temp_col) ax.axhline(y=TEMP_HIGH_WARN, color="orange", linestyle="--", label=f"Warning ({TEMP_HIGH_WARN}F)") ax.axhline(y=TEMP_HIGH_ALARM, color="red", linestyle="--", label=f"Alarm ({TEMP_HIGH_ALARM}F)") # Highlight alarm points alarm_mask = bfp[temp_col] > TEMP_HIGH_ALARM ax.scatter(bfp.index[alarm_mask], bfp[temp_col][alarm_mask], c="red", s=20, zorder=5, label="Alarm Points") ax.set_title(f"{temp_col}: Threshold Alerts") ax.set_ylabel("Temperature (F)") ax.legend() ax.xaxis.set_major_formatter(mdates.DateFormatter("%b")) plt.tight_layout() plt.show()

Threshold limits vs. statistical anomalies: Fixed thresholds catch obvious problems (a bearing at 185F is always concerning), but they miss subtle degradation patterns like a slow upward trend that has not yet crossed the alarm limit. That is where Isolation Forest comes in.

5

Anomaly Detection with Isolation Forest

Isolation Forest detects anomalies by isolating observations in random decision trees. Points that are easy to isolate (require fewer splits) are anomalies. Unlike threshold alerts, it considers multiple sensors simultaneously.

# Select features for anomaly detection # Use bearing temps + vibration for both pumps feature_cols = [c for c in bfp.columns if ("BEAR" in c or "VIB" in c) and "rolling" not in c and "band" not in c] print(f"Features for anomaly detection ({len(feature_cols)}):") for col in feature_cols: print(f" {col}") # Drop rows with NaN (from parquet edges) X = bfp[feature_cols].dropna() print(f"\nSamples for training: {len(X):,}")
# Fit Isolation Forest iso_forest = IsolationForest( n_estimators=200, contamination=0.02, # expect ~2% anomalies random_state=42 ) iso_forest.fit(X) # Predict: -1 = anomaly, 1 = normal bfp.loc[X.index, "iso_label"] = iso_forest.predict(X) bfp.loc[X.index, "iso_score"] = iso_forest.decision_function(X) n_anomalies = (bfp["iso_label"] == -1).sum() print(f"Anomalies detected: {n_anomalies} ({n_anomalies/len(X)*100:.1f}%)")
Anomalies detected: 176 (2.0%)
# Visualize anomalies on temperature trend fig, ax = plt.subplots(figsize=(14, 5)) normal = bfp[bfp["iso_label"] == 1] anomaly = bfp[bfp["iso_label"] == -1] ax.plot(normal.index, normal[temp_col], color="#5FCCDB", alpha=0.5, linewidth=0.5, label="Normal") ax.scatter(anomaly.index, anomaly[temp_col], c="red", s=15, zorder=5, label="Anomaly") ax.set_title("Isolation Forest Anomalies on BFP A Bearing Temperature") ax.set_ylabel("Temperature (F)") ax.legend() ax.xaxis.set_major_formatter(mdates.DateFormatter("%b")) plt.tight_layout() plt.show()

What is Isolation Forest? Unlike most anomaly detectors that learn what "normal" looks like, Isolation Forest explicitly isolates anomalies. It builds random trees by randomly selecting a feature and a random split value. Anomalous points (rare, different values) require fewer random splits to isolate than normal points. The anomaly score reflects how quickly each point is isolated across all trees.

6

Health Dashboard

Combine threshold alerts, rolling statistics, and Isolation Forest scores into a single health dashboard that gives plant operators a unified view of BFP condition.

# Build a daily health summary daily = bfp.resample("D").agg({ temp_col: ["mean", "max"], "rolling_std_24h": "mean", "iso_score": "mean", "iso_label": lambda x: (x == -1).sum() }) daily.columns = ["temp_mean", "temp_max", "avg_volatility", "avg_iso_score", "anomaly_count"] print("Daily health summary (first 10 days):") print(daily.head(10).round(2))
# 4-panel health dashboard fig, axes = plt.subplots(4, 1, figsize=(14, 14), sharex=True) # Panel 1: Daily mean temperature axes[0].plot(daily.index, daily["temp_mean"], color="#5FCCDB", linewidth=2) axes[0].fill_between(daily.index, daily["temp_mean"], alpha=0.2, color="#5FCCDB") axes[0].axhline(y=TEMP_HIGH_WARN, color="orange", linestyle="--", alpha=0.7) axes[0].set_ylabel("Mean Temp (F)") axes[0].set_title("BFP Health Dashboard", fontsize=14, fontweight="bold") # Panel 2: Daily max temperature axes[1].bar(daily.index, daily["temp_max"], color="#2D6A7A", alpha=0.7, width=1) axes[1].axhline(y=TEMP_HIGH_ALARM, color="red", linestyle="--", alpha=0.7) axes[1].set_ylabel("Max Temp (F)") # Panel 3: Volatility (rolling std) axes[2].plot(daily.index, daily["avg_volatility"], color="#D69E2E", linewidth=2) axes[2].set_ylabel("Avg Volatility") # Panel 4: Daily anomaly count colors = ["red" if v > 0 else "#5FCCDB" for v in daily["anomaly_count"]] axes[3].bar(daily.index, daily["anomaly_count"], color=colors, width=1) axes[3].set_ylabel("Anomalies") axes[3].set_xlabel("Date") axes[3].xaxis.set_major_formatter(mdates.DateFormatter("%b")) plt.tight_layout() plt.show()

What You Built and Next Steps

  1. Loaded hourly BFP time-series data with 88 sensor channels
  2. Explored bearing temperature and vibration trends for both pumps
  3. Computed 24-hour rolling mean and standard deviation to smooth noisy signals
  4. Implemented threshold-based alerts using typical plant alarm limits
  5. Trained an Isolation Forest model for multivariate anomaly detection
  6. Built a 4-panel health dashboard combining all indicators

Ideas to Try Next

  • Add pump speed and discharge pressure: Include additional BFP columns to give the Isolation Forest more context about operating conditions
  • Cross-reference alarm logs: Load alarm_log.csv and overlay actual plant alarms on your dashboard
  • Trend rate-of-change: Compute the derivative of the rolling mean to detect accelerating degradation
  • Compare A vs. B: Build side-by-side comparison charts to spot diverging behavior between redundant pumps
  • Weekly health reports: Aggregate the daily dashboard into weekly summaries for maintenance planning

Key Terms Glossary

  • BFP (Boiler Feed Pump) — a high-pressure pump that delivers feedwater from the deaerator to the boiler drum
  • Rolling mean — the average of a fixed-size sliding window across a time series; smooths short-term noise
  • Rolling standard deviation — the variability within the same sliding window; measures signal volatility
  • Isolation Forest — an unsupervised anomaly detection algorithm that isolates outliers using random partitioning trees
  • Contamination — the expected fraction of anomalies in the dataset; controls the Isolation Forest threshold
  • Bearing DE/NDE — Drive End and Non-Drive End bearings; the two main bearing positions on a rotating machine
  • Threshold alert — a fixed-value alarm that fires when a sensor exceeds a predefined limit

Ready to Level Up?

In the advanced guide, you'll classify specific fault types using multi-sensor fusion, Random Forest, and SHAP explainability.

Go to Advanced Rotating Equipment Fault Diagnosis →