Prerequisite: Complete Guide 04: Predictive Asset Maintenance first. This guide extends the binary "will it fail?" classifier into a survival model that answers "when will it fail?" and estimates remaining useful life.
What You Will Learn
In Guide 04 you built an XGBoost classifier that labels each transformer as "likely to fail" or "not." That is useful for ranking, but it throws away a crucial dimension: time. Survival analysis models the full time-to-failure distribution, which lets you estimate when an asset will fail—not just whether it will. In this guide you will:
- Prepare survival data with time-to-event and censoring indicators
- Fit Kaplan–Meier survival curves and compare groups of transformers
- Run log-rank tests to determine whether survival differences are statistically significant
- Build a Cox Proportional Hazards regression model with asset and environmental covariates
- Interpret hazard ratios to understand which factors accelerate failure
- Predict individual survival curves and remaining useful life for every transformer
- Create a risk-weighted replacement priority schedule that accounts for consequence
Why survival analysis? Traditional classification discards information about transformers that are still running. Survival analysis treats these as "censored" observations—we know they survived at least this long, even if we don't know when they will eventually fail. This makes much better use of your data, especially when failures are rare.
SP&L Data You Will Use
- assets/transformers.csv — 86 transformers with kVA rating, installation year, manufacturer, type, and health index (1–5)
- assets/maintenance_log.csv — inspection dates, work orders, and replacement records
- outages/outage_events.csv — historical outage events linked to equipment failures
- weather/hourly_observations.csv — weather exposure history for environmental stress analysis
Additional Libraries
pip install lifelines
We start by loading the same datasets from Guide 04, plus weather data that we will use later to build environmental covariates for the Cox model.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from lifelines import KaplanMeierFitter, CoxPHFitter
from lifelines.statistics import logrank_test
from lifelines.utils import median_survival_times
DATA_DIR = "sisyphean-power-and-light/"
transformers = pd.read_csv(DATA_DIR + "assets/transformers.csv")
maintenance = pd.read_csv(DATA_DIR + "assets/maintenance_log.csv",
parse_dates=["inspection_date"])
outages = pd.read_csv(DATA_DIR + "outages/outage_events.csv",
parse_dates=["fault_detected"])
weather = pd.read_csv(DATA_DIR + "weather/hourly_observations.csv",
parse_dates=["timestamp"])
print(f"Transformers: {len(transformers)}")
print(f"Maintenance logs: {len(maintenance)}")
print(f"Outage events: {len(outages)}")
print(f"Weather records: {len(weather):,}")
Transformers: 86
Maintenance logs: 1,247
Outage events: 3,247
Weather records: 43,824
Before we write more code, let's understand why survival analysis is different from the binary classifier you built in Guide 04.
In Guide 04 you asked: "Will this transformer fail?" The answer was 0 or 1. But consider two transformers that have never failed:
- Transformer A: Installed in 2020 (5 years old, no failure)
- Transformer B: Installed in 1985 (40 years old, no failure)
A binary classifier treats both the same: has_failed = 0. But Transformer B surviving 40 years is much more informative than Transformer A surviving 5 years. Survival analysis captures this through a concept called censoring.
What is censoring? A transformer is "right-censored" if it has not failed by the end of our observation period. We know it survived at least this long, but we don't know its true failure time. Instead of throwing away this data (as binary classification effectively does), survival analysis uses it to estimate the survival function more accurately.
Survival analysis gives us three outputs that classification cannot:
- Survival curve: The probability of surviving beyond any given age
- Hazard function: The instantaneous failure rate at a given age
- Remaining useful life: A per-asset estimate of expected time until failure
Survival analysis requires two columns for each subject: the duration (time from installation to failure or current date) and an event indicator (1 = failure observed, 0 = still running / censored).
equip_failures = outages[outages["cause_code"] == "equipment_failure"]
first_failure = equip_failures.groupby("transformer_id").agg(
first_failure_date=("fault_detected", "min")
).reset_index()
df = transformers.merge(first_failure, on="transformer_id", how="left")
df["event_observed"] = df["first_failure_date"].notna().astype(int)
observation_end = pd.Timestamp("2025-01-01")
df["install_date"] = pd.to_datetime(df["install_year"], format="%Y")
df["duration_years"] = np.where(
df["event_observed"] == 1,
(df["first_failure_date"] - df["install_date"]).dt.days / 365.25,
(observation_end - df["install_date"]).dt.days / 365.25
)
df["duration_years"] = df["duration_years"].clip(lower=0.5)
print(f"Total transformers: {len(df)}")
print(f"Failed (event=1): {df['event_observed'].sum()}")
print(f"Censored (event=0): {(df['event_observed'] == 0).sum()}")
print(f"\nDuration range: {df['duration_years'].min():.1f} – {df['duration_years'].max():.1f} years")
Total transformers: 86
Failed (event=1): 23
Censored (event=0): 63
Duration range: 0.5 – 42.0 years
Important: Using only the first failure per transformer is a simplification. In reality, a transformer can be repaired and fail again. More advanced models (recurrent event models) handle this, but the single-event approach is the right starting point and is standard practice in asset management.
Sample size consideration: We have 86 transformers with only 23 observed failures. This is a small dataset for survival analysis. The standard rule of thumb for Cox PH is 10–15 events per covariate; with 6 covariates, you would ideally want 60–90 failure events. With 23 events, the hazard ratio estimates will have wide confidence intervals, and some effects may not reach statistical significance even if they are real. The penalizer=0.01 helps stabilize the estimates, but treat the specific numerical results with appropriate caution. To strengthen this analysis, you could expand the dataset by including transformers from neighboring utilities, extending the observation window, or incorporating failure records from additional asset types (e.g., reclosers, regulators) with similar failure mechanisms.
The Kaplan–Meier estimator is the foundation of survival analysis. It produces a non-parametric estimate of the survival function: the probability of surviving beyond a given time. No assumptions about the shape of the curve are needed.
kmf = KaplanMeierFitter()
kmf.fit(
durations=df["duration_years"],
event_observed=df["event_observed"],
label="All Transformers"
)
fig, ax = plt.subplots(figsize=(10, 6))
kmf.plot_survival_function(ax=ax, color="#2D6A7A", linewidth=2)
ax.set_title("Kaplan–Meier Survival Curve: SP&L Transformer Fleet")
ax.set_xlabel("Years Since Installation")
ax.set_ylabel("Survival Probability")
ax.set_ylim(0, 1.05)
ax.axhline(y=0.5, color="gray", linestyle="--", alpha=0.5, label="50% survival")
ax.legend()
plt.tight_layout()
plt.show()
median = kmf.median_survival_time_
print(f"Median survival time: {median:.1f} years")
The curve starts at 1.0 (all transformers alive at year 0) and drops as failures occur. The steps down represent observed failures; the flat segments between steps include censored observations. The shaded band shows the 95% confidence interval.
Compare Survival by Manufacturer
Do transformers from different manufacturers fail at different rates? Plot separate Kaplan–Meier curves to find out.
fig, ax = plt.subplots(figsize=(10, 6))
colors = ["#2D6A7A", "#5FCCDB", "#D69E2E", "#E53E3E"]
for i, (mfr, group) in enumerate(df.groupby("manufacturer")):
kmf_mfr = KaplanMeierFitter()
kmf_mfr.fit(
durations=group["duration_years"],
event_observed=group["event_observed"],
label=mfr
)
kmf_mfr.plot_survival_function(
ax=ax, color=colors[i % len(colors)], linewidth=2
)
ax.set_title("Survival Curves by Manufacturer")
ax.set_xlabel("Years Since Installation")
ax.set_ylabel("Survival Probability")
ax.set_ylim(0, 1.05)
ax.legend(title="Manufacturer")
plt.tight_layout()
plt.show()
Compare Survival by kVA Rating Group
df["kva_group"] = pd.cut(
df["kva_rating"],
bins=[0, 50, 100, 500, 5000],
labels=["Small (≤50)", "Medium (51-100)",
"Large (101-500)", "Very Large (500+)"]
)
fig, ax = plt.subplots(figsize=(10, 6))
for i, (group_name, group) in enumerate(df.groupby("kva_group")):
if len(group) < 3:
continue
kmf_kva = KaplanMeierFitter()
kmf_kva.fit(
durations=group["duration_years"],
event_observed=group["event_observed"],
label=group_name
)
kmf_kva.plot_survival_function(
ax=ax, color=colors[i % len(colors)], linewidth=2
)
ax.set_title("Survival Curves by kVA Rating Group")
ax.set_xlabel("Years Since Installation")
ax.set_ylabel("Survival Probability")
ax.set_ylim(0, 1.05)
ax.legend(title="kVA Group")
plt.tight_layout()
plt.show()
Eyeballing curves is useful, but the log-rank test gives a formal statistical answer to the question: "Do these two groups have significantly different survival experiences?"
manufacturers = df["manufacturer"].unique()
mfr_a = df[df["manufacturer"] == manufacturers[0]]
mfr_b = df[df["manufacturer"] == manufacturers[1]]
result = logrank_test(
mfr_a["duration_years"], mfr_b["duration_years"],
event_observed_A=mfr_a["event_observed"],
event_observed_B=mfr_b["event_observed"]
)
print(f"Log-rank test: {manufacturers[0]} vs {manufacturers[1]}")
print(f" Test statistic: {result.test_statistic:.3f}")
print(f" p-value: {result.p_value:.4f}")
if result.p_value < 0.05:
print(" → Statistically significant difference (p < 0.05)")
else:
print(" → No statistically significant difference (p ≥ 0.05)")
from itertools import combinations
print("Pairwise log-rank tests (all manufacturers):")
print(f"{'Group A':<20} {'Group B':<20} {'p-value':>10} {'Significant?':>14}")
print("-" * 66)
for a, b in combinations(manufacturers, 2):
grp_a = df[df["manufacturer"] == a]
grp_b = df[df["manufacturer"] == b]
res = logrank_test(
grp_a["duration_years"], grp_b["duration_years"],
event_observed_A=grp_a["event_observed"],
event_observed_B=grp_b["event_observed"]
)
sig = "Yes" if res.p_value < 0.05 else "No"
print(f"{a:<20} {b:<20} {res.p_value:>10.4f} {sig:>14}")
Interpreting the log-rank test: A p-value below 0.05 means there is less than a 5% chance the observed difference is due to random chance alone. This helps you decide whether manufacturer selection (or feeder assignment, or kVA rating) genuinely affects transformer longevity, or whether apparent differences are just noise in a small dataset.
Kaplan–Meier curves compare groups, but they don't handle multiple continuous covariates simultaneously. The Cox Proportional Hazards (PH) model is the regression equivalent for survival data. It estimates how each covariate affects the hazard (failure rate) while controlling for the others.
maint_feats = maintenance.groupby("transformer_id").agg(
work_orders=("work_order_id", "count"),
last_inspection=("inspection_date", "max")
).reset_index()
maint_feats["days_since_inspection"] = (
pd.Timestamp("2025-01-01") - maint_feats["last_inspection"]
).dt.days
df = df.merge(
maint_feats[["transformer_id", "work_orders", "days_since_inspection"]],
on="transformer_id", how="left"
)
df["work_orders"] = df["work_orders"].fillna(0)
df["days_since_inspection"] = df["days_since_inspection"].fillna(9999)
weather["extreme_day"] = (
(weather["wind_speed"] > 35) |
(weather["temperature"] > 100) |
(weather["temperature"] < 10)
).astype(int)
weather["year"] = weather["timestamp"].dt.year
annual_extreme = weather.groupby("year")["extreme_day"].sum()
def compute_lifetime_exposure(row):
"""Sum extreme weather days from install year to present."""
years_active = annual_extreme[
(annual_extreme.index >= row["install_year"]) &
(annual_extreme.index <= 2024)
]
return years_active.sum()
df["lifetime_extreme_days"] = df.apply(compute_lifetime_exposure, axis=1)
df["avg_extreme_weather_days"] = df["lifetime_extreme_days"] / df["duration_years"].clip(lower=1)
df["type_code"] = df["type"].map({"oil": 0, "dry": 1}).fillna(0)
print("Covariates prepared.")
print(df[["transformer_id", "health_index", "kva_rating",
"work_orders", "days_since_inspection",
"duration_years", "event_observed"]].head())
cox_cols = [
"duration_years", "event_observed",
"health_index", "kva_rating", "type_code",
"work_orders", "days_since_inspection",
"avg_extreme_weather_days"
]
cox_df = df[cox_cols].dropna()
cox_df["kva_rating_scaled"] = cox_df["kva_rating"] / 100
cox_df["days_since_insp_scaled"] = cox_df["days_since_inspection"] / 365
cox_df = cox_df.drop(columns=["kva_rating", "days_since_inspection"])
cph = CoxPHFitter(penalizer=0.01)
cph.fit(
cox_df,
duration_col="duration_years",
event_col="event_observed"
)
cph.print_summary()
print(f"\nConcordance index: {cph.concordance_index_:.3f}")
Concordance index: This is the survival analysis equivalent of AUC. It measures the model’s ability to correctly rank pairs of transformers by their failure time. A concordance of 0.5 means the model is no better than random guessing; 1.0 means perfect discrimination. Values above 0.65 are considered useful for clinical/engineering applications.
Why a penalizer? With only 86 transformers and several covariates, we add a small L2 penalty (penalizer=0.01) to prevent overfitting. This is equivalent to ridge regularization and stabilizes coefficient estimates when the sample size is modest relative to the number of features.
The Cox model output includes hazard ratios for each covariate. A hazard ratio greater than 1 means the factor increases failure risk; less than 1 means it is protective.
summary = cph.summary
fig, ax = plt.subplots(figsize=(9, 5))
hr = summary["exp(coef)"]
hr_lower = summary["exp(coef) lower 95%"]
hr_upper = summary["exp(coef) upper 95%"]
y_pos = range(len(hr))
ax.barh(y_pos, hr.values, color="#5FCCDB", height=0.6, alpha=0.8)
ax.errorbar(hr.values, y_pos,
xerr=[hr.values - hr_lower.values, hr_upper.values - hr.values],
fmt="none", color="#2D6A7A", capsize=4)
ax.axvline(x=1.0, color="red", linestyle="--", alpha=0.7, label="HR = 1 (no effect)")
ax.set_yticks(y_pos)
ax.set_yticklabels(hr.index)
ax.set_xlabel("Hazard Ratio (95% CI)")
ax.set_title("Cox PH Hazard Ratios: What Accelerates Transformer Failure?")
ax.legend()
plt.tight_layout()
plt.show()
print("\nHazard ratio interpretation:")
for covariate in hr.index:
ratio = hr[covariate]
p_val = summary.loc[covariate, "p"]
if ratio > 1:
direction = "increases"
pct = (ratio - 1) * 100
else:
direction = "decreases"
pct = (1 - ratio) * 100
sig = "*" if p_val < 0.05 else ""
print(f" {covariate}: HR={ratio:.3f} → {direction} hazard by {pct:.1f}% {sig}")
Reading hazard ratios: If health_index has an HR of 0.65, it means each one-point increase in health index reduces the failure hazard by 35%. Conversely, if days_since_insp_scaled has an HR of 1.42, each additional year without inspection increases hazard by 42%. These are powerful, interpretable insights for asset managers.
The Cox model can produce a personalized survival curve for each transformer, conditioned on its specific covariates. This gives you remaining useful life (RUL) estimates for every asset in the fleet.
surv_funcs = cph.predict_survival_function(cox_df)
sample_ids = df["transformer_id"].iloc[:5].values
fig, ax = plt.subplots(figsize=(10, 6))
for i, tid in enumerate(sample_ids):
idx = df[df["transformer_id"] == tid].index[0]
if idx in surv_funcs.columns:
ax.plot(surv_funcs.index, surv_funcs[idx],
label=f"{tid} (age={df.loc[idx, 'duration_years']:.0f}yr, HI={df.loc[idx, 'health_index']})",
linewidth=2)
ax.set_title("Individual Predicted Survival Curves")
ax.set_xlabel("Years Since Installation")
ax.set_ylabel("Survival Probability")
ax.set_ylim(0, 1.05)
ax.legend(title="Transformer", fontsize=9)
plt.tight_layout()
plt.show()
median_survival = cph.predict_median(cox_df)
df["predicted_median_life"] = median_survival.values
df["remaining_useful_life"] = df["predicted_median_life"] - df["duration_years"]
df["remaining_useful_life"] = df["remaining_useful_life"].clip(lower=0)
print("Remaining Useful Life Estimates (top 15 most urgent):")
rul_table = df.sort_values("remaining_useful_life")[[
"transformer_id", "duration_years", "health_index",
"predicted_median_life", "remaining_useful_life"
]].head(15)
print(rul_table.to_string(index=False, float_format="%.1f"))
What is remaining useful life? RUL estimates how many years a transformer has left before it reaches a 50% failure probability. A transformer with RUL = 2.3 years is expected to reach its median lifetime in about 2.3 years. Transformers with RUL near zero have already exceeded their predicted median and are operating on borrowed time.
Survival analysis tells you when a transformer is likely to fail. But not all failures have equal consequences. A transformer serving 200 customers matters more than one serving 10. Combining failure risk with consequence produces an actionable replacement schedule.
df["hazard_score"] = cph.predict_partial_hazard(cox_df).values
df["hazard_norm"] = (
(df["hazard_score"] - df["hazard_score"].min()) /
(df["hazard_score"].max() - df["hazard_score"].min())
)
df["consequence_norm"] = (
(df["kva_rating"] - df["kva_rating"].min()) /
(df["kva_rating"].max() - df["kva_rating"].min())
)
df["priority_score"] = df["hazard_norm"] * 0.6 + df["consequence_norm"] * 0.4
priority_list = df.sort_values("priority_score", ascending=False)
print("Top 10 Replacement Priorities:")
print(priority_list[[
"transformer_id", "duration_years", "health_index",
"kva_rating", "remaining_useful_life", "priority_score"
]].head(10).to_string(index=False, float_format="%.2f"))
fig, ax = plt.subplots(figsize=(10, 7))
scatter = ax.scatter(
df["hazard_norm"], df["consequence_norm"],
c=df["priority_score"], cmap="YlOrRd",
s=80, edgecolors="white", linewidth=0.5, alpha=0.9
)
plt.colorbar(scatter, label="Priority Score")
top5 = priority_list.head(5)
for _, row in top5.iterrows():
ax.annotate(row["transformer_id"],
(row["hazard_norm"], row["consequence_norm"]),
fontsize=8, fontweight="bold",
xytext=(5, 5), textcoords="offset points")
ax.set_xlabel("Failure Likelihood (normalized hazard)")
ax.set_ylabel("Consequence (normalized kVA)")
ax.set_title("Replacement Priority Matrix: Risk × Consequence")
ax.axvline(x=0.5, color="gray", linestyle="--", alpha=0.3)
ax.axhline(y=0.5, color="gray", linestyle="--", alpha=0.3)
ax.text(0.75, 0.75, "REPLACE NOW", ha="center", fontsize=11,
fontweight="bold", color="#E53E3E", alpha=0.7)
ax.text(0.25, 0.75, "MONITOR", ha="center", fontsize=11,
fontweight="bold", color="#D69E2E", alpha=0.7)
ax.text(0.75, 0.25, "SCHEDULE", ha="center", fontsize=11,
fontweight="bold", color="#D69E2E", alpha=0.7)
ax.text(0.25, 0.25, "LOW PRIORITY", ha="center", fontsize=11,
fontweight="bold", color="#718096", alpha=0.7)
plt.tight_layout()
plt.show()
Tuning the weights: The 60/40 split between failure likelihood and consequence is a common approach in utility asset management, but the specific weights are illustrative. In practice, these weights should be calibrated by the utility’s engineering and planning teams based on their risk tolerance, regulatory framework, and capital budget constraints. Some utilities weight consequence more heavily to protect large commercial customers or critical facilities. Others use a pure risk-based approach (probability × consequence). The right weights depend on your utility’s specific risk framework—there is no universal standard.
Transformer failure risk is not constant throughout the year. Extreme heat accelerates oil degradation, and storm seasons increase mechanical stress. Let's create a seasonal risk adjustment using the weather data.
weather["month"] = weather["timestamp"].dt.month
monthly_extreme = weather.groupby("month")["extreme_day"].mean()
seasonal_multiplier = monthly_extreme / monthly_extreme.mean()
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
bar_colors = ["#E53E3E" if v > 1.2 else "#D69E2E" if v > 1.0
else "#5FCCDB" for v in seasonal_multiplier.values]
axes[0].bar(months, seasonal_multiplier.values, color=bar_colors)
axes[0].axhline(y=1.0, color="gray", linestyle="--", alpha=0.5)
axes[0].set_title("Seasonal Risk Multiplier")
axes[0].set_ylabel("Risk Multiplier (1.0 = average)")
axes[0].tick_params(axis="x", rotation=45)
peak_month = seasonal_multiplier.idxmax()
peak_mult = seasonal_multiplier.max()
top10 = priority_list.head(10).copy()
top10["adjusted_score"] = top10["priority_score"] * peak_mult
y_pos = range(len(top10))
axes[1].barh(y_pos, top10["priority_score"], height=0.4,
color="#5FCCDB", label="Base Priority")
axes[1].barh([y + 0.4 for y in y_pos], top10["adjusted_score"],
height=0.4, color="#E53E3E", label=f"Peak Month (×{peak_mult:.2f})")
axes[1].set_yticks([y + 0.2 for y in y_pos])
axes[1].set_yticklabels(top10["transformer_id"].values)
axes[1].set_xlabel("Priority Score")
axes[1].set_title("Top 10: Base vs Peak-Season Priority")
axes[1].legend()
plt.tight_layout()
plt.show()
print(f"\nPeak risk month: {months[peak_month - 1]} (multiplier: {peak_mult:.2f}×)")
print("Use seasonal multipliers to time capital replacement projects")
print("before high-risk seasons, not after.")
Practical application: If summer is the peak risk season (multiplier of 1.4x), schedule replacements of high-priority transformers in spring. This avoids the double risk of operating aging assets during the most stressful period while also managing the risk of a construction outage during peak demand.
Model Persistence and Feature Notes
import pickle
with open("cox_model.pkl", "wb") as f:
pickle.dump(cph, f)
with open("cox_model.pkl", "rb") as f:
cph = pickle.load(f)
Feature engineering rationale: We used 6 covariates with clear asset management interpretations. The health_index (1–5) is a composite condition score from inspections. kva_rating proxies for both equipment size and consequence of failure. days_since_inspection captures maintenance recency—transformers that have not been inspected recently may have undetected degradation.
- Prepared survival data with time-to-event durations and censoring indicators
- Fitted Kaplan–Meier curves to visualize fleet-wide and group-level survival
- Used log-rank tests to determine statistically significant survival differences
- Built a Cox Proportional Hazards model with asset and environmental covariates
- Interpreted hazard ratios to understand which factors accelerate failure
- Predicted individual survival curves and remaining useful life estimates
- Created a risk-weighted replacement priority schedule with consequence scoring
- Adjusted priorities for seasonal weather risk to time capital projects
Ideas to Try Next
- Time-varying covariates: Use
lifelines.CoxTimeVaryingFitter to model how changing conditions (degrading health index over time) affect hazard
- Accelerated Failure Time models: Try
lifelines.WeibullAFTFitter or LogNormalAFTFitter for parametric alternatives to Cox PH
- Extend to other assets: Apply the same survival framework to poles (
assets/poles.csv) and conductors (assets/conductors.csv)
- Budget optimization: Use the priority ranking with replacement cost estimates to maximize risk reduction per dollar of capital spending
- Concordance index tuning: Evaluate model discrimination with
cph.concordance_index_ and tune the penalizer and feature set
Key Terms Glossary
- Survival analysis — a statistical framework for modeling time-to-event data, accounting for censored observations
- Censoring — when the event of interest (failure) has not yet occurred for a subject; the true failure time is unknown but at least as long as the observed time
- Kaplan–Meier estimator — a non-parametric method for estimating the survival function from time-to-event data
- Log-rank test — a statistical test comparing survival distributions between two or more groups
- Cox Proportional Hazards — a semi-parametric regression model that estimates how covariates affect the hazard (failure rate)
- Hazard ratio — the multiplicative effect of a one-unit change in a covariate on the failure rate; HR > 1 means increased risk
- Remaining useful life (RUL) — the estimated time remaining before an asset reaches its predicted median failure point
- Consequence scoring — weighting failure risk by the impact of that failure (customers affected, kVA lost)