Guide 09: Advanced Outage Prediction - ML Playground

Prerequisite: Complete Guide 01: Outage Prediction first. This guide extends the binary classifier into a multi-class model that predicts the cause of an outage and explains why.

What You Will Learn

In Guide 01, you built a binary classifier: outage or no outage. But utility reliability teams need more—they need to know why an outage is likely. Is it vegetation encroachment during the growing season? Weather-driven during storm season? Equipment failure on aging infrastructure? And what about the significant share of outages with undetermined causes? In this guide you will:

Build a multi-class classifier that predicts outage cause codes (vegetation, weather, equipment, animal, overload, and unknown)
Use XGBoost instead of Random Forest for improved performance on imbalanced classes
Implement a time-aware train/test split that simulates real-world deployment
Add asset features (transformer age, condition scores) alongside weather
Use SHAP values to explain individual predictions—not just global feature importance
Benchmark your model against SP&L's annual SAIFI reliability metrics

SP&L Data You Will Use

outages/outage_events.csv — 3,200+ outage records with cause codes
weather/hourly_observations.csv — hourly weather data
assets/transformers.csv — transformer age, condition scores, kVA ratings
assets/conductors.csv — conductor age and vegetation clearance zones
outages/reliability_metrics.csv — annual SAIFI/SAIDI for benchmarking

Additional Libraries

pip install xgboost shap

Load and Merge All Data Sources

Unlike Guide 01 where we used only weather data, here we merge weather, asset, and conductor data to give the model a richer picture of outage drivers.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import xgboost as xgb
import shap
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder

DATA_DIR = "sisyphean-power-and-light/"

# Load all datasets
outages = pd.read_csv(DATA_DIR + "outages/outage_events.csv",
                       parse_dates=["fault_detected"])
weather = pd.read_csv(DATA_DIR + "weather/hourly_observations.csv",
                       parse_dates=["timestamp"])
transformers = pd.read_csv(DATA_DIR + "assets/transformers.csv")
conductors = pd.read_csv(DATA_DIR + "assets/conductors.csv")

print(f"Outages: {len(outages):,} events")
print(f"Cause codes: {outages['cause_code'].unique()}")
print(f"\nCause code distribution:")
print(outages["cause_code"].value_counts())
                    

Outages: 3,247 events
Cause codes: ['vegetation' 'weather' 'equipment_failure' 'animal_contact' 'overload' 'unknown']

Cause code distribution:
vegetation 987
weather 842
equipment_failure 614
animal_contact 389
overload 247
unknown 168

Why multi-class? A binary model tells you "an outage might happen." A multi-class model tells you "an outage might happen due to vegetation." This lets crews prepare the right response: tree trimming crews for vegetation, line patrol for equipment, or storm staging for weather-driven events.

Build Enriched Feature Set

We combine daily weather summaries with asset condition data for each outage's feeder. This gives the model both environmental and infrastructure context. Note that we keep all cause codes including “unknown”—in real utility data, a significant portion of outages have undetermined causes, and the model should learn to recognize these patterns rather than ignoring them.

# Create daily weather features (same as Guide 01)
weather["date"] = weather["timestamp"].dt.date
daily_weather = weather.groupby("date").agg({
    "temperature":  ["max", "min", "mean"],
    "wind_speed":   ["max", "mean"],
    "precipitation": "sum",
    "humidity":      "mean",
}).reset_index()
daily_weather.columns = ["date", "temp_max", "temp_min", "temp_mean",
                          "wind_max", "wind_mean", "precip_total", "humidity_mean"]

# Add asset features per feeder
feeder_assets = transformers.groupby("feeder_id").agg({
    "install_year":    "mean",
    "condition_score": "mean",
    "kva_rating":      "sum",
}).reset_index()
feeder_assets.columns = ["feeder_id", "avg_install_year",
                          "avg_condition_score", "total_kva"]
feeder_assets["avg_asset_age"] = 2025 - feeder_assets["avg_install_year"]

# Build the training table: one row per outage event
outages["date"] = outages["fault_detected"].dt.date
df = outages.merge(daily_weather, on="date", how="left")
df = df.merge(feeder_assets, on="feeder_id", how="left")

# Add time features
df["fault_detected"] = pd.to_datetime(df["fault_detected"])
df["month"] = df["fault_detected"].dt.month
df["hour"] = df["fault_detected"].dt.hour
df["day_of_week"] = df["fault_detected"].dt.dayofweek
df["is_summer"] = df["month"].isin([6, 7, 8]).astype(int)
df["is_storm_season"] = df["month"].isin([3, 4, 5, 6]).astype(int)

# Drop rows with missing weather data (keep all cause codes including unknown)
df = df.dropna(subset=["temp_max"])

print(f"Training samples: {len(df):,}")
print(f"\nFeatures built per event:")
print(f"  Weather: 7 | Asset: 3 | Calendar: 5")
                    

Time-Aware Train/Test Split

In Guide 01, we used a random split. But in production, your model always predicts the future based on the past. A time-aware split is more honest: train on 2020–2023 data, test on 2024–2025.

# Define features and target
feature_cols = [
    "temp_max", "temp_min", "temp_mean",
    "wind_max", "wind_mean", "precip_total", "humidity_mean",
    "avg_asset_age", "avg_condition_score", "total_kva",
    "month", "hour", "day_of_week", "is_summer", "is_storm_season"
]

# Encode cause codes as integers
le = LabelEncoder()
df["cause_label"] = le.fit_transform(df["cause_code"])
cause_names = le.classes_

# Time-aware split: train on 2020-2023, test on 2024-2025
train_mask = df["fault_detected"].dt.year <= 2023
test_mask  = df["fault_detected"].dt.year >= 2024

X_train = df.loc[train_mask, feature_cols]
y_train = df.loc[train_mask, "cause_label"]
X_test  = df.loc[test_mask, feature_cols]
y_test  = df.loc[test_mask, "cause_label"]

print(f"Training set: {len(X_train):,} events (2020-2023)")
print(f"Test set:     {len(X_test):,} events (2024-2025)")
print(f"Classes: {list(cause_names)}")
                    

Why not random split? Random splitting lets future data "leak" into the training set. If your model trains on a July 2024 storm and tests on a June 2024 event, it has an unfair advantage. Time-aware splitting gives you honest performance estimates that reflect how the model will actually perform when deployed.

Train XGBoost Multi-Class Classifier

XGBoost (Extreme Gradient Boosting) builds trees sequentially, where each new tree corrects the mistakes of the previous ones. It typically outperforms Random Forest on structured data, especially with class imbalance.

# Calculate class weights for imbalanced data
class_counts = y_train.value_counts().sort_index()
total = len(y_train)
n_classes = len(class_counts)
class_weights = {i: total / (n_classes * count)
                 for i, count in class_counts.items()}

# Assign sample weights
sample_weights = y_train.map(class_weights)

# Train XGBoost
model = xgb.XGBClassifier(
    n_estimators=300,
    max_depth=6,
    learning_rate=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    objective="multi:softprob",
    num_class=n_classes,
    random_state=42,
    eval_metric="mlogloss",
)

model.fit(
    X_train, y_train,
    sample_weight=sample_weights,
    eval_set=[(X_test, y_test)],
    verbose=50
)

print("Training complete.")
                    

A note on eval_set: We pass (X_test, y_test) as the evaluation set so you can watch the model’s loss decrease during training. Be aware that this means the training procedure can “see” test set performance—if you add early_stopping_rounds, the model will use the test set to decide when to stop, which is a mild form of information leakage. In production pipelines, best practice is to create a separate validation set (e.g., a 2023-only holdout) for monitoring and early stopping, and reserve the true test set (2024–2025) for final evaluation only. For this tutorial, the impact is minimal since we are not tuning hyperparameters against the eval set, but it is an important distinction to understand.

XGBoost vs Random Forest: Random Forest builds trees independently (in parallel). XGBoost builds trees sequentially, with each tree focusing on the mistakes of the ensemble so far. The learning_rate controls how aggressively each tree corrects errors. Lower rates (0.01–0.1) with more trees generally give better results but take longer to train.

Evaluate Multi-Class Performance

# Predict on test set
y_pred = model.predict(X_test)

# Classification report with cause code names
print(classification_report(y_test, y_pred,
      target_names=cause_names))

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
fig, ax = plt.subplots(figsize=(8, 7))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues",
            xticklabels=cause_names, yticklabels=cause_names, ax=ax)
ax.set_xlabel("Predicted Cause")
ax.set_ylabel("Actual Cause")
ax.set_title("Multi-Class Confusion Matrix: Outage Cause Prediction")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()
                    

Look at the confusion matrix to understand where the model gets confused. Weather and vegetation outages are often the hardest to distinguish because storms can cause both tree-contact and direct wind/lightning damage.

Explain Predictions with SHAP

Feature importance tells you which features matter globally. SHAP (SHapley Additive exPlanations) goes further: it tells you how much each feature contributed to a specific prediction and in which direction.

# Create SHAP explainer
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# Summary plot: how each feature affects each class
fig, axes = plt.subplots(1, len(cause_names), figsize=(20, 6))
for i, cause in enumerate(cause_names):
    plt.sca(axes[i])
    shap.summary_plot(shap_values[i], X_test,
                       feature_names=feature_cols,
                       show=False, max_display=8)
    axes[i].set_title(f"{cause}")
plt.tight_layout()
plt.show()
                    

Reading SHAP plots: Each dot is one prediction. The x-axis shows the SHAP value (positive = pushes toward this class, negative = pushes away). The color shows the feature value (red = high, blue = low). For example, if you see high wind_max values (red dots) pushed to the right for the "weather" class, it means high wind makes the model more confident the outage is weather-driven.

Explain Individual Predictions

SHAP's real power is explaining individual events. Pick a specific outage and see exactly why the model predicted its cause.

# Pick a specific outage event from the test set
event_idx = 0
event = X_test.iloc[event_idx]
actual = cause_names[y_test.iloc[event_idx]]
predicted = cause_names[y_pred[event_idx]]

print(f"Event details:")
print(f"  Actual cause:    {actual}")
print(f"  Predicted cause: {predicted}")
print(f"  Wind max:        {event['wind_max']:.1f} mph")
print(f"  Precip total:    {event['precip_total']:.2f} in")
print(f"  Avg asset age:   {event['avg_asset_age']:.0f} years")

# Waterfall plot for this prediction
predicted_class = y_pred[event_idx]
shap.waterfall_plot(
    shap.Explanation(
        values=shap_values[predicted_class][event_idx],
        base_values=explainer.expected_value[predicted_class],
        data=event.values,
        feature_names=feature_cols
    )
)
                    

The waterfall plot shows how the model built its prediction step by step. Each bar shows one feature's contribution. Red bars push toward the predicted class; blue bars push against it. The final value is the model's log-odds for that class.

Benchmark Against SAIFI Metrics

How does your model's predicted outage distribution compare to SP&L's actual reliability metrics? This bridges the gap between ML model accuracy and real utility KPIs.

# Load reliability metrics
metrics = pd.read_csv(DATA_DIR + "outages/reliability_metrics.csv")
print(metrics)

# Compare predicted vs actual cause distribution for test years
test_outages = df[test_mask]

actual_dist = test_outages["cause_code"].value_counts(normalize=True)
pred_causes = pd.Series(cause_names[y_pred])
pred_dist = pred_causes.value_counts(normalize=True)

comparison = pd.DataFrame({
    "Actual %": (actual_dist * 100).round(1),
    "Predicted %": (pred_dist * 100).round(1)
})
print("\nCause distribution comparison (test set):")
print(comparison)

# Plot side-by-side
fig, ax = plt.subplots(figsize=(10, 5))
x = np.arange(len(comparison))
width = 0.35
ax.bar(x - width/2, comparison["Actual %"], width, label="Actual", color="#2D6A7A")
ax.bar(x + width/2, comparison["Predicted %"], width, label="Predicted", color="#5FCCDB")
ax.set_xticks(x)
ax.set_xticklabels(comparison.index, rotation=45, ha="right")
ax.set_ylabel("Percentage of Outages")
ax.set_title("Predicted vs Actual Outage Cause Distribution (2024-2025)")
ax.legend()
plt.tight_layout()
plt.show()
                    

Seasonal Cause Analysis

Outage causes vary by season. Vegetation peaks in spring/summer during the growing season. Weather outages spike during storm season. Equipment failures may increase in extreme heat. Let's validate the model captures these patterns.

# Predicted causes by month
test_outages_with_pred = test_outages.copy()
test_outages_with_pred["predicted_cause"] = cause_names[y_pred]

monthly_causes = test_outages_with_pred.groupby(
    ["month", "predicted_cause"]
).size().unstack(fill_value=0)

fig, ax = plt.subplots(figsize=(12, 6))
monthly_causes.plot(kind="bar", stacked=True, ax=ax,
                    colormap="Set2")
ax.set_xlabel("Month")
ax.set_ylabel("Predicted Outage Count")
ax.set_title("Predicted Outage Causes by Month")
ax.legend(title="Cause", bbox_to_anchor=(1.05, 1))
plt.tight_layout()
plt.show()
                    

Reproducibility and Model Persistence

# For reproducible results, set random seeds at the top of your notebook
np.random.seed(42)

# Save the trained model for later use
import joblib
joblib.dump(model, "outage_cause_model.pkl")

# Load it back
model = joblib.load("outage_cause_model.pkl")
                    

Why these hyperparameters? n_estimators=300, max_depth=6, learning_rate=0.1 are standard starting points for tree-based models. More trees (300 vs the default 100) compensate for the low learning rate, while max_depth=6 limits tree complexity to prevent overfitting on our modest dataset. Setting np.random.seed(42) at the top of your notebook ensures that data shuffling, train/test splits, and model initialization produce identical results every run.

✓

What You Built and Next Steps

Merged weather, asset, and conductor data into a rich feature set
Built a multi-class XGBoost classifier that predicts outage cause codes
Used time-aware splitting to honestly evaluate forward-looking performance
Applied SHAP to explain both global patterns and individual predictions
Benchmarked model predictions against SP&L's reliability metrics
Analyzed seasonal variation in predicted outage causes

Ideas to Try Next

Temporal Convolutional Networks: Replace XGBoost with a TCN to capture sequence patterns in time-ordered outage data
Hyperparameter tuning: Use optuna or sklearn.model_selection.RandomizedSearchCV to optimize XGBoost parameters
Lightning data: Add lightning strike proximity from weather/lightning_strikes.csv as a feature
Spatial features: Add feeder topology features (length, number of taps, rural vs urban)
Probability calibration: Use CalibratedClassifierCV to ensure predicted probabilities reflect true likelihoods

Key Terms Glossary

Multi-class classification — predicting one of several categories (not just binary yes/no)
XGBoost — Extreme Gradient Boosting; builds trees sequentially to correct errors
SHAP values — SHapley Additive exPlanations; a game-theory approach to explain individual predictions
Time-aware split — using past data for training and future data for testing to simulate deployment
Class imbalance — when some categories have far fewer examples than others
SAIFI — System Average Interruption Frequency Index; average number of interruptions per customer per year
Feature engineering — creating informative input variables from raw data

← Beginner: Outage Prediction Next: Advanced Load Forecasting →

Multi-Class Outage Cause Prediction with XGBoost & SHAP

What You Will Learn

SP&L Data You Will Use

Additional Libraries

Load and Merge All Data Sources

Build Enriched Feature Set

Time-Aware Train/Test Split

Train XGBoost Multi-Class Classifier

Evaluate Multi-Class Performance

Explain Predictions with SHAP

Explain Individual Predictions

Benchmark Against SAIFI Metrics

Seasonal Cause Analysis

Reproducibility and Model Persistence

What You Built and Next Steps

Ideas to Try Next

Key Terms Glossary