← Back to All Guides
Guide 12 — Advanced

Survival Analysis for Transformer Failure Prediction

Prerequisite: Complete Guide 04: Predictive Asset Maintenance first. This guide extends the binary "will it fail?" classifier into a survival model that answers "when will it fail?" and estimates remaining useful life.

What You Will Learn

In Guide 04 you built an XGBoost classifier that labels each transformer as "likely to fail" or "not." That is useful for ranking, but it throws away a crucial dimension: time. Survival analysis models the full time-to-failure distribution, which lets you estimate when an asset will fail—not just whether it will. In this guide you will:

  • Prepare survival data with time-to-event and censoring indicators
  • Fit Kaplan–Meier survival curves and compare groups of transformers
  • Run log-rank tests to determine whether survival differences are statistically significant
  • Build a Cox Proportional Hazards regression model with asset and environmental covariates
  • Interpret hazard ratios to understand which factors accelerate failure
  • Predict individual survival curves and remaining useful life for every transformer
  • Create a risk-weighted replacement priority schedule that accounts for consequence

Why survival analysis? Traditional classification discards information about transformers that are still running. Survival analysis treats these as "censored" observations—we know they survived at least this long, even if we don't know when they will eventually fail. This makes much better use of your data, especially when failures are rare.

SP&L Data You Will Use

  • assets/transformers.csv — 86 transformers with kVA rating, installation year, manufacturer, type, and health index (1–5)
  • assets/maintenance_log.csv — inspection dates, work orders, and replacement records
  • outages/outage_events.csv — historical outage events linked to equipment failures
  • weather/hourly_observations.csv — weather exposure history for environmental stress analysis

Additional Libraries

pip install lifelines
1

Load Transformer and Event Data

We start by loading the same datasets from Guide 04, plus weather data that we will use later to build environmental covariates for the Cox model.

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from lifelines import KaplanMeierFitter, CoxPHFitter from lifelines.statistics import logrank_test from lifelines.utils import median_survival_times DATA_DIR = "sisyphean-power-and-light/" # Load asset data transformers = pd.read_csv(DATA_DIR + "assets/transformers.csv") maintenance = pd.read_csv(DATA_DIR + "assets/maintenance_log.csv", parse_dates=["inspection_date"]) outages = pd.read_csv(DATA_DIR + "outages/outage_events.csv", parse_dates=["fault_detected"]) weather = pd.read_csv(DATA_DIR + "weather/hourly_observations.csv", parse_dates=["timestamp"]) print(f"Transformers: {len(transformers)}") print(f"Maintenance logs: {len(maintenance)}") print(f"Outage events: {len(outages)}") print(f"Weather records: {len(weather):,}")
Transformers: 86
Maintenance logs: 1,247
Outage events: 3,247
Weather records: 43,824
2

Survival Analysis vs Classification

Before we write more code, let's understand why survival analysis is different from the binary classifier you built in Guide 04.

In Guide 04 you asked: "Will this transformer fail?" The answer was 0 or 1. But consider two transformers that have never failed:

  • Transformer A: Installed in 2020 (5 years old, no failure)
  • Transformer B: Installed in 1985 (40 years old, no failure)

A binary classifier treats both the same: has_failed = 0. But Transformer B surviving 40 years is much more informative than Transformer A surviving 5 years. Survival analysis captures this through a concept called censoring.

What is censoring? A transformer is "right-censored" if it has not failed by the end of our observation period. We know it survived at least this long, but we don't know its true failure time. Instead of throwing away this data (as binary classification effectively does), survival analysis uses it to estimate the survival function more accurately.

Survival analysis gives us three outputs that classification cannot:

  1. Survival curve: The probability of surviving beyond any given age
  2. Hazard function: The instantaneous failure rate at a given age
  3. Remaining useful life: A per-asset estimate of expected time until failure
3

Prepare Survival Data

Survival analysis requires two columns for each subject: the duration (time from installation to failure or current date) and an event indicator (1 = failure observed, 0 = still running / censored).

# Identify transformers that experienced equipment failures equip_failures = outages[outages["cause_code"] == "equipment_failure"] # Get the first failure date per transformer first_failure = equip_failures.groupby("transformer_id").agg( first_failure_date=("fault_detected", "min") ).reset_index() # Merge with transformer inventory df = transformers.merge(first_failure, on="transformer_id", how="left") # Event indicator: 1 = failed, 0 = censored (still running) df["event_observed"] = df["first_failure_date"].notna().astype(int) # Duration: years from install to failure (or to current date if censored) observation_end = pd.Timestamp("2025-01-01") df["install_date"] = pd.to_datetime(df["install_year"], format="%Y") df["duration_years"] = np.where( df["event_observed"] == 1, (df["first_failure_date"] - df["install_date"]).dt.days / 365.25, (observation_end - df["install_date"]).dt.days / 365.25 ) # Ensure positive durations df["duration_years"] = df["duration_years"].clip(lower=0.5) print(f"Total transformers: {len(df)}") print(f"Failed (event=1): {df['event_observed'].sum()}") print(f"Censored (event=0): {(df['event_observed'] == 0).sum()}") print(f"\nDuration range: {df['duration_years'].min():.1f} – {df['duration_years'].max():.1f} years")
Total transformers: 86
Failed (event=1): 23
Censored (event=0): 63

Duration range: 0.5 – 42.0 years

Important: Using only the first failure per transformer is a simplification. In reality, a transformer can be repaired and fail again. More advanced models (recurrent event models) handle this, but the single-event approach is the right starting point and is standard practice in asset management.

Sample size consideration: We have 86 transformers with only 23 observed failures. This is a small dataset for survival analysis. The standard rule of thumb for Cox PH is 10–15 events per covariate; with 6 covariates, you would ideally want 60–90 failure events. With 23 events, the hazard ratio estimates will have wide confidence intervals, and some effects may not reach statistical significance even if they are real. The penalizer=0.01 helps stabilize the estimates, but treat the specific numerical results with appropriate caution. To strengthen this analysis, you could expand the dataset by including transformers from neighboring utilities, extending the observation window, or incorporating failure records from additional asset types (e.g., reclosers, regulators) with similar failure mechanisms.

4

Kaplan–Meier Survival Curves

The Kaplan–Meier estimator is the foundation of survival analysis. It produces a non-parametric estimate of the survival function: the probability of surviving beyond a given time. No assumptions about the shape of the curve are needed.

# Fit overall Kaplan-Meier curve kmf = KaplanMeierFitter() kmf.fit( durations=df["duration_years"], event_observed=df["event_observed"], label="All Transformers" ) fig, ax = plt.subplots(figsize=(10, 6)) kmf.plot_survival_function(ax=ax, color="#2D6A7A", linewidth=2) ax.set_title("Kaplan–Meier Survival Curve: SP&L Transformer Fleet") ax.set_xlabel("Years Since Installation") ax.set_ylabel("Survival Probability") ax.set_ylim(0, 1.05) ax.axhline(y=0.5, color="gray", linestyle="--", alpha=0.5, label="50% survival") ax.legend() plt.tight_layout() plt.show() # Median survival time (50% of transformers have failed by this age) median = kmf.median_survival_time_ print(f"Median survival time: {median:.1f} years")

The curve starts at 1.0 (all transformers alive at year 0) and drops as failures occur. The steps down represent observed failures; the flat segments between steps include censored observations. The shaded band shows the 95% confidence interval.

Compare Survival by Manufacturer

Do transformers from different manufacturers fail at different rates? Plot separate Kaplan–Meier curves to find out.

# Split by manufacturer and plot separate KM curves fig, ax = plt.subplots(figsize=(10, 6)) colors = ["#2D6A7A", "#5FCCDB", "#D69E2E", "#E53E3E"] for i, (mfr, group) in enumerate(df.groupby("manufacturer")): kmf_mfr = KaplanMeierFitter() kmf_mfr.fit( durations=group["duration_years"], event_observed=group["event_observed"], label=mfr ) kmf_mfr.plot_survival_function( ax=ax, color=colors[i % len(colors)], linewidth=2 ) ax.set_title("Survival Curves by Manufacturer") ax.set_xlabel("Years Since Installation") ax.set_ylabel("Survival Probability") ax.set_ylim(0, 1.05) ax.legend(title="Manufacturer") plt.tight_layout() plt.show()

Compare Survival by kVA Rating Group

# Create kVA rating groups df["kva_group"] = pd.cut( df["kva_rating"], bins=[0, 50, 100, 500, 5000], labels=["Small (≤50)", "Medium (51-100)", "Large (101-500)", "Very Large (500+)"] ) fig, ax = plt.subplots(figsize=(10, 6)) for i, (group_name, group) in enumerate(df.groupby("kva_group")): if len(group) < 3: continue kmf_kva = KaplanMeierFitter() kmf_kva.fit( durations=group["duration_years"], event_observed=group["event_observed"], label=group_name ) kmf_kva.plot_survival_function( ax=ax, color=colors[i % len(colors)], linewidth=2 ) ax.set_title("Survival Curves by kVA Rating Group") ax.set_xlabel("Years Since Installation") ax.set_ylabel("Survival Probability") ax.set_ylim(0, 1.05) ax.legend(title="kVA Group") plt.tight_layout() plt.show()
5

Log-Rank Test: Are Differences Significant?

Eyeballing curves is useful, but the log-rank test gives a formal statistical answer to the question: "Do these two groups have significantly different survival experiences?"

# Compare two manufacturers statistically manufacturers = df["manufacturer"].unique() mfr_a = df[df["manufacturer"] == manufacturers[0]] mfr_b = df[df["manufacturer"] == manufacturers[1]] result = logrank_test( mfr_a["duration_years"], mfr_b["duration_years"], event_observed_A=mfr_a["event_observed"], event_observed_B=mfr_b["event_observed"] ) print(f"Log-rank test: {manufacturers[0]} vs {manufacturers[1]}") print(f" Test statistic: {result.test_statistic:.3f}") print(f" p-value: {result.p_value:.4f}") if result.p_value < 0.05: print(" → Statistically significant difference (p < 0.05)") else: print(" → No statistically significant difference (p ≥ 0.05)")
# Run pairwise log-rank tests across all manufacturers from itertools import combinations print("Pairwise log-rank tests (all manufacturers):") print(f"{'Group A':<20} {'Group B':<20} {'p-value':>10} {'Significant?':>14}") print("-" * 66) for a, b in combinations(manufacturers, 2): grp_a = df[df["manufacturer"] == a] grp_b = df[df["manufacturer"] == b] res = logrank_test( grp_a["duration_years"], grp_b["duration_years"], event_observed_A=grp_a["event_observed"], event_observed_B=grp_b["event_observed"] ) sig = "Yes" if res.p_value < 0.05 else "No" print(f"{a:<20} {b:<20} {res.p_value:>10.4f} {sig:>14}")

Interpreting the log-rank test: A p-value below 0.05 means there is less than a 5% chance the observed difference is due to random chance alone. This helps you decide whether manufacturer selection (or feeder assignment, or kVA rating) genuinely affects transformer longevity, or whether apparent differences are just noise in a small dataset.

6

Cox Proportional Hazards Model

Kaplan–Meier curves compare groups, but they don't handle multiple continuous covariates simultaneously. The Cox Proportional Hazards (PH) model is the regression equivalent for survival data. It estimates how each covariate affects the hazard (failure rate) while controlling for the others.

# Engineer covariates for the Cox model # Maintenance features: work order count and days since last inspection maint_feats = maintenance.groupby("transformer_id").agg( work_orders=("work_order_id", "count"), last_inspection=("inspection_date", "max") ).reset_index() maint_feats["days_since_inspection"] = ( pd.Timestamp("2025-01-01") - maint_feats["last_inspection"] ).dt.days df = df.merge( maint_feats[["transformer_id", "work_orders", "days_since_inspection"]], on="transformer_id", how="left" ) df["work_orders"] = df["work_orders"].fillna(0) df["days_since_inspection"] = df["days_since_inspection"].fillna(9999) # Weather exposure: extreme weather days per transformer # Extreme defined as: wind > 35 mph, temperature > 100°F, or temperature < 10°F weather["extreme_day"] = ( (weather["wind_speed"] > 35) | (weather["temperature"] > 100) | (weather["temperature"] < 10) ).astype(int) # Compute per-transformer lifetime extreme weather exposure # Each transformer has been exposed to extreme weather for the years it has # been in service. We compute the total extreme days during each transformer's # specific installation period, giving a location-in-time metric. weather["year"] = weather["timestamp"].dt.year annual_extreme = weather.groupby("year")["extreme_day"].sum() def compute_lifetime_exposure(row): """Sum extreme weather days from install year to present.""" years_active = annual_extreme[ (annual_extreme.index >= row["install_year"]) & (annual_extreme.index <= 2024) ] return years_active.sum() df["lifetime_extreme_days"] = df.apply(compute_lifetime_exposure, axis=1) # Annualize: average extreme days per year in service df["avg_extreme_weather_days"] = df["lifetime_extreme_days"] / df["duration_years"].clip(lower=1) # Encode transformer type as numeric df["type_code"] = df["type"].map({"oil": 0, "dry": 1}).fillna(0) print("Covariates prepared.") print(df[["transformer_id", "health_index", "kva_rating", "work_orders", "days_since_inspection", "duration_years", "event_observed"]].head())
# Fit the Cox Proportional Hazards model cox_cols = [ "duration_years", "event_observed", "health_index", "kva_rating", "type_code", "work_orders", "days_since_inspection", "avg_extreme_weather_days" ] cox_df = df[cox_cols].dropna() # Standardize large-range columns for numerical stability cox_df["kva_rating_scaled"] = cox_df["kva_rating"] / 100 cox_df["days_since_insp_scaled"] = cox_df["days_since_inspection"] / 365 cox_df = cox_df.drop(columns=["kva_rating", "days_since_inspection"]) cph = CoxPHFitter(penalizer=0.01) cph.fit( cox_df, duration_col="duration_years", event_col="event_observed" ) cph.print_summary() # Concordance index: the model's ability to correctly rank transformers # by failure time. 0.5 = random guessing, 1.0 = perfect discrimination print(f"\nConcordance index: {cph.concordance_index_:.3f}")

Concordance index: This is the survival analysis equivalent of AUC. It measures the model’s ability to correctly rank pairs of transformers by their failure time. A concordance of 0.5 means the model is no better than random guessing; 1.0 means perfect discrimination. Values above 0.65 are considered useful for clinical/engineering applications.

Why a penalizer? With only 86 transformers and several covariates, we add a small L2 penalty (penalizer=0.01) to prevent overfitting. This is equivalent to ridge regularization and stabilizes coefficient estimates when the sample size is modest relative to the number of features.

7

Interpret Hazard Ratios

The Cox model output includes hazard ratios for each covariate. A hazard ratio greater than 1 means the factor increases failure risk; less than 1 means it is protective.

# Extract and visualize hazard ratios summary = cph.summary fig, ax = plt.subplots(figsize=(9, 5)) # Plot hazard ratios with confidence intervals hr = summary["exp(coef)"] hr_lower = summary["exp(coef) lower 95%"] hr_upper = summary["exp(coef) upper 95%"] y_pos = range(len(hr)) ax.barh(y_pos, hr.values, color="#5FCCDB", height=0.6, alpha=0.8) ax.errorbar(hr.values, y_pos, xerr=[hr.values - hr_lower.values, hr_upper.values - hr.values], fmt="none", color="#2D6A7A", capsize=4) ax.axvline(x=1.0, color="red", linestyle="--", alpha=0.7, label="HR = 1 (no effect)") ax.set_yticks(y_pos) ax.set_yticklabels(hr.index) ax.set_xlabel("Hazard Ratio (95% CI)") ax.set_title("Cox PH Hazard Ratios: What Accelerates Transformer Failure?") ax.legend() plt.tight_layout() plt.show() # Print interpretation print("\nHazard ratio interpretation:") for covariate in hr.index: ratio = hr[covariate] p_val = summary.loc[covariate, "p"] if ratio > 1: direction = "increases" pct = (ratio - 1) * 100 else: direction = "decreases" pct = (1 - ratio) * 100 sig = "*" if p_val < 0.05 else "" print(f" {covariate}: HR={ratio:.3f} → {direction} hazard by {pct:.1f}% {sig}")

Reading hazard ratios: If health_index has an HR of 0.65, it means each one-point increase in health index reduces the failure hazard by 35%. Conversely, if days_since_insp_scaled has an HR of 1.42, each additional year without inspection increases hazard by 42%. These are powerful, interpretable insights for asset managers.

8

Predict Individual Survival Curves

The Cox model can produce a personalized survival curve for each transformer, conditioned on its specific covariates. This gives you remaining useful life (RUL) estimates for every asset in the fleet.

# Predict survival curves for all transformers surv_funcs = cph.predict_survival_function(cox_df) # Plot survival curves for 5 specific transformers sample_ids = df["transformer_id"].iloc[:5].values fig, ax = plt.subplots(figsize=(10, 6)) for i, tid in enumerate(sample_ids): idx = df[df["transformer_id"] == tid].index[0] if idx in surv_funcs.columns: ax.plot(surv_funcs.index, surv_funcs[idx], label=f"{tid} (age={df.loc[idx, 'duration_years']:.0f}yr, HI={df.loc[idx, 'health_index']})", linewidth=2) ax.set_title("Individual Predicted Survival Curves") ax.set_xlabel("Years Since Installation") ax.set_ylabel("Survival Probability") ax.set_ylim(0, 1.05) ax.legend(title="Transformer", fontsize=9) plt.tight_layout() plt.show()
# Estimate remaining useful life (RUL) for each transformer # RUL = median predicted survival time - current age median_survival = cph.predict_median(cox_df) df["predicted_median_life"] = median_survival.values df["remaining_useful_life"] = df["predicted_median_life"] - df["duration_years"] df["remaining_useful_life"] = df["remaining_useful_life"].clip(lower=0) print("Remaining Useful Life Estimates (top 15 most urgent):") rul_table = df.sort_values("remaining_useful_life")[[ "transformer_id", "duration_years", "health_index", "predicted_median_life", "remaining_useful_life" ]].head(15) print(rul_table.to_string(index=False, float_format="%.1f"))

What is remaining useful life? RUL estimates how many years a transformer has left before it reaches a 50% failure probability. A transformer with RUL = 2.3 years is expected to reach its median lifetime in about 2.3 years. Transformers with RUL near zero have already exceeded their predicted median and are operating on borrowed time.

9

Build a Replacement Priority Ranking

Survival analysis tells you when a transformer is likely to fail. But not all failures have equal consequences. A transformer serving 200 customers matters more than one serving 10. Combining failure risk with consequence produces an actionable replacement schedule.

# Current failure hazard (predicted partial hazard from Cox model) df["hazard_score"] = cph.predict_partial_hazard(cox_df).values # Consequence score: kVA rating as a proxy for customers affected # Normalize both to 0-1 scale df["hazard_norm"] = ( (df["hazard_score"] - df["hazard_score"].min()) / (df["hazard_score"].max() - df["hazard_score"].min()) ) df["consequence_norm"] = ( (df["kva_rating"] - df["kva_rating"].min()) / (df["kva_rating"].max() - df["kva_rating"].min()) ) # Risk priority score = failure likelihood × consequence df["priority_score"] = df["hazard_norm"] * 0.6 + df["consequence_norm"] * 0.4 # Rank by priority (highest score = replace first) priority_list = df.sort_values("priority_score", ascending=False) print("Top 10 Replacement Priorities:") print(priority_list[[ "transformer_id", "duration_years", "health_index", "kva_rating", "remaining_useful_life", "priority_score" ]].head(10).to_string(index=False, float_format="%.2f"))
# Visualize: risk vs consequence scatter plot fig, ax = plt.subplots(figsize=(10, 7)) scatter = ax.scatter( df["hazard_norm"], df["consequence_norm"], c=df["priority_score"], cmap="YlOrRd", s=80, edgecolors="white", linewidth=0.5, alpha=0.9 ) plt.colorbar(scatter, label="Priority Score") # Label the top 5 highest-priority transformers top5 = priority_list.head(5) for _, row in top5.iterrows(): ax.annotate(row["transformer_id"], (row["hazard_norm"], row["consequence_norm"]), fontsize=8, fontweight="bold", xytext=(5, 5), textcoords="offset points") ax.set_xlabel("Failure Likelihood (normalized hazard)") ax.set_ylabel("Consequence (normalized kVA)") ax.set_title("Replacement Priority Matrix: Risk × Consequence") ax.axvline(x=0.5, color="gray", linestyle="--", alpha=0.3) ax.axhline(y=0.5, color="gray", linestyle="--", alpha=0.3) # Quadrant labels ax.text(0.75, 0.75, "REPLACE NOW", ha="center", fontsize=11, fontweight="bold", color="#E53E3E", alpha=0.7) ax.text(0.25, 0.75, "MONITOR", ha="center", fontsize=11, fontweight="bold", color="#D69E2E", alpha=0.7) ax.text(0.75, 0.25, "SCHEDULE", ha="center", fontsize=11, fontweight="bold", color="#D69E2E", alpha=0.7) ax.text(0.25, 0.25, "LOW PRIORITY", ha="center", fontsize=11, fontweight="bold", color="#718096", alpha=0.7) plt.tight_layout() plt.show()

Tuning the weights: The 60/40 split between failure likelihood and consequence is a common approach in utility asset management, but the specific weights are illustrative. In practice, these weights should be calibrated by the utility’s engineering and planning teams based on their risk tolerance, regulatory framework, and capital budget constraints. Some utilities weight consequence more heavily to protect large commercial customers or critical facilities. Others use a pure risk-based approach (probability × consequence). The right weights depend on your utility’s specific risk framework—there is no universal standard.

10

Weather-Adjusted Seasonal Risk

Transformer failure risk is not constant throughout the year. Extreme heat accelerates oil degradation, and storm seasons increase mechanical stress. Let's create a seasonal risk adjustment using the weather data.

# Monthly extreme weather frequency weather["month"] = weather["timestamp"].dt.month monthly_extreme = weather.groupby("month")["extreme_day"].mean() # Create seasonal risk multiplier (normalized around 1.0) seasonal_multiplier = monthly_extreme / monthly_extreme.mean() fig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Plot 1: Seasonal risk multiplier months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] bar_colors = ["#E53E3E" if v > 1.2 else "#D69E2E" if v > 1.0 else "#5FCCDB" for v in seasonal_multiplier.values] axes[0].bar(months, seasonal_multiplier.values, color=bar_colors) axes[0].axhline(y=1.0, color="gray", linestyle="--", alpha=0.5) axes[0].set_title("Seasonal Risk Multiplier") axes[0].set_ylabel("Risk Multiplier (1.0 = average)") axes[0].tick_params(axis="x", rotation=45) # Plot 2: Adjusted priority scores for top 10 in peak month peak_month = seasonal_multiplier.idxmax() peak_mult = seasonal_multiplier.max() top10 = priority_list.head(10).copy() top10["adjusted_score"] = top10["priority_score"] * peak_mult y_pos = range(len(top10)) axes[1].barh(y_pos, top10["priority_score"], height=0.4, color="#5FCCDB", label="Base Priority") axes[1].barh([y + 0.4 for y in y_pos], top10["adjusted_score"], height=0.4, color="#E53E3E", label=f"Peak Month (×{peak_mult:.2f})") axes[1].set_yticks([y + 0.2 for y in y_pos]) axes[1].set_yticklabels(top10["transformer_id"].values) axes[1].set_xlabel("Priority Score") axes[1].set_title("Top 10: Base vs Peak-Season Priority") axes[1].legend() plt.tight_layout() plt.show() print(f"\nPeak risk month: {months[peak_month - 1]} (multiplier: {peak_mult:.2f}×)") print("Use seasonal multipliers to time capital replacement projects") print("before high-risk seasons, not after.")

Practical application: If summer is the peak risk season (multiplier of 1.4x), schedule replacements of high-priority transformers in spring. This avoids the double risk of operating aging assets during the most stressful period while also managing the risk of a construction outage during peak demand.

Model Persistence and Feature Notes

# Save the fitted Cox model (lifelines models are picklable) import pickle with open("cox_model.pkl", "wb") as f: pickle.dump(cph, f) # Load it back with open("cox_model.pkl", "rb") as f: cph = pickle.load(f)

Feature engineering rationale: We used 6 covariates with clear asset management interpretations. The health_index (1–5) is a composite condition score from inspections. kva_rating proxies for both equipment size and consequence of failure. days_since_inspection captures maintenance recency—transformers that have not been inspected recently may have undetected degradation.

What You Built and Next Steps

  1. Prepared survival data with time-to-event durations and censoring indicators
  2. Fitted Kaplan–Meier curves to visualize fleet-wide and group-level survival
  3. Used log-rank tests to determine statistically significant survival differences
  4. Built a Cox Proportional Hazards model with asset and environmental covariates
  5. Interpreted hazard ratios to understand which factors accelerate failure
  6. Predicted individual survival curves and remaining useful life estimates
  7. Created a risk-weighted replacement priority schedule with consequence scoring
  8. Adjusted priorities for seasonal weather risk to time capital projects

Ideas to Try Next

  • Time-varying covariates: Use lifelines.CoxTimeVaryingFitter to model how changing conditions (degrading health index over time) affect hazard
  • Accelerated Failure Time models: Try lifelines.WeibullAFTFitter or LogNormalAFTFitter for parametric alternatives to Cox PH
  • Extend to other assets: Apply the same survival framework to poles (assets/poles.csv) and conductors (assets/conductors.csv)
  • Budget optimization: Use the priority ranking with replacement cost estimates to maximize risk reduction per dollar of capital spending
  • Concordance index tuning: Evaluate model discrimination with cph.concordance_index_ and tune the penalizer and feature set

Key Terms Glossary

  • Survival analysis — a statistical framework for modeling time-to-event data, accounting for censored observations
  • Censoring — when the event of interest (failure) has not yet occurred for a subject; the true failure time is unknown but at least as long as the observed time
  • Kaplan–Meier estimator — a non-parametric method for estimating the survival function from time-to-event data
  • Log-rank test — a statistical test comparing survival distributions between two or more groups
  • Cox Proportional Hazards — a semi-parametric regression model that estimates how covariates affect the hazard (failure rate)
  • Hazard ratio — the multiplicative effect of a one-unit change in a covariate on the failure rate; HR > 1 means increased risk
  • Remaining useful life (RUL) — the estimated time remaining before an asset reaches its predicted median failure point
  • Consequence scoring — weighting failure risk by the impact of that failure (customers affected, kVA lost)