← Back to All Guides
Guide 09 — Advanced

Multi-Class Outage Cause Prediction with XGBoost & SHAP

Prerequisite: Complete Guide 01: Outage Prediction first. This guide extends the binary classifier into a multi-class model that predicts the cause of an outage and explains why.

What You Will Learn

In Guide 01, you built a binary classifier: outage or no outage. But utility reliability teams need more—they need to know why an outage is likely. Is it vegetation encroachment during the growing season? Weather-driven during storm season? Equipment failure on aging infrastructure? And what about the significant share of outages with undetermined causes? In this guide you will:

  • Build a multi-class classifier that predicts outage cause codes (vegetation, weather, equipment, animal, overload, and unknown)
  • Use XGBoost instead of Random Forest for improved performance on imbalanced classes
  • Implement a time-aware train/test split that simulates real-world deployment
  • Add asset features (transformer age, condition scores) alongside weather
  • Use SHAP values to explain individual predictions—not just global feature importance
  • Benchmark your model against SP&L's annual SAIFI reliability metrics

SP&L Data You Will Use

  • outages/outage_events.csv — 3,200+ outage records with cause codes
  • weather/hourly_observations.csv — hourly weather data
  • assets/transformers.csv — transformer age, condition scores, kVA ratings
  • assets/conductors.csv — conductor age and vegetation clearance zones
  • outages/reliability_metrics.csv — annual SAIFI/SAIDI for benchmarking

Additional Libraries

pip install xgboost shap
1

Load and Merge All Data Sources

Unlike Guide 01 where we used only weather data, here we merge weather, asset, and conductor data to give the model a richer picture of outage drivers.

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import xgboost as xgb import shap from sklearn.metrics import classification_report, confusion_matrix from sklearn.preprocessing import LabelEncoder DATA_DIR = "sisyphean-power-and-light/" # Load all datasets outages = pd.read_csv(DATA_DIR + "outages/outage_events.csv", parse_dates=["fault_detected"]) weather = pd.read_csv(DATA_DIR + "weather/hourly_observations.csv", parse_dates=["timestamp"]) transformers = pd.read_csv(DATA_DIR + "assets/transformers.csv") conductors = pd.read_csv(DATA_DIR + "assets/conductors.csv") print(f"Outages: {len(outages):,} events") print(f"Cause codes: {outages['cause_code'].unique()}") print(f"\nCause code distribution:") print(outages["cause_code"].value_counts())
Outages: 3,247 events
Cause codes: ['vegetation' 'weather' 'equipment_failure' 'animal_contact' 'overload' 'unknown']

Cause code distribution:
vegetation 987
weather 842
equipment_failure 614
animal_contact 389
overload 247
unknown 168

Why multi-class? A binary model tells you "an outage might happen." A multi-class model tells you "an outage might happen due to vegetation." This lets crews prepare the right response: tree trimming crews for vegetation, line patrol for equipment, or storm staging for weather-driven events.

2

Build Enriched Feature Set

We combine daily weather summaries with asset condition data for each outage's feeder. This gives the model both environmental and infrastructure context. Note that we keep all cause codes including “unknown”—in real utility data, a significant portion of outages have undetermined causes, and the model should learn to recognize these patterns rather than ignoring them.

# Create daily weather features (same as Guide 01) weather["date"] = weather["timestamp"].dt.date daily_weather = weather.groupby("date").agg({ "temperature": ["max", "min", "mean"], "wind_speed": ["max", "mean"], "precipitation": "sum", "humidity": "mean", }).reset_index() daily_weather.columns = ["date", "temp_max", "temp_min", "temp_mean", "wind_max", "wind_mean", "precip_total", "humidity_mean"] # Add asset features per feeder feeder_assets = transformers.groupby("feeder_id").agg({ "install_year": "mean", "condition_score": "mean", "kva_rating": "sum", }).reset_index() feeder_assets.columns = ["feeder_id", "avg_install_year", "avg_condition_score", "total_kva"] feeder_assets["avg_asset_age"] = 2025 - feeder_assets["avg_install_year"] # Build the training table: one row per outage event outages["date"] = outages["fault_detected"].dt.date df = outages.merge(daily_weather, on="date", how="left") df = df.merge(feeder_assets, on="feeder_id", how="left") # Add time features df["fault_detected"] = pd.to_datetime(df["fault_detected"]) df["month"] = df["fault_detected"].dt.month df["hour"] = df["fault_detected"].dt.hour df["day_of_week"] = df["fault_detected"].dt.dayofweek df["is_summer"] = df["month"].isin([6, 7, 8]).astype(int) df["is_storm_season"] = df["month"].isin([3, 4, 5, 6]).astype(int) # Drop rows with missing weather data (keep all cause codes including unknown) df = df.dropna(subset=["temp_max"]) print(f"Training samples: {len(df):,}") print(f"\nFeatures built per event:") print(f" Weather: 7 | Asset: 3 | Calendar: 5")
3

Time-Aware Train/Test Split

In Guide 01, we used a random split. But in production, your model always predicts the future based on the past. A time-aware split is more honest: train on 2020–2023 data, test on 2024–2025.

# Define features and target feature_cols = [ "temp_max", "temp_min", "temp_mean", "wind_max", "wind_mean", "precip_total", "humidity_mean", "avg_asset_age", "avg_condition_score", "total_kva", "month", "hour", "day_of_week", "is_summer", "is_storm_season" ] # Encode cause codes as integers le = LabelEncoder() df["cause_label"] = le.fit_transform(df["cause_code"]) cause_names = le.classes_ # Time-aware split: train on 2020-2023, test on 2024-2025 train_mask = df["fault_detected"].dt.year <= 2023 test_mask = df["fault_detected"].dt.year >= 2024 X_train = df.loc[train_mask, feature_cols] y_train = df.loc[train_mask, "cause_label"] X_test = df.loc[test_mask, feature_cols] y_test = df.loc[test_mask, "cause_label"] print(f"Training set: {len(X_train):,} events (2020-2023)") print(f"Test set: {len(X_test):,} events (2024-2025)") print(f"Classes: {list(cause_names)}")

Why not random split? Random splitting lets future data "leak" into the training set. If your model trains on a July 2024 storm and tests on a June 2024 event, it has an unfair advantage. Time-aware splitting gives you honest performance estimates that reflect how the model will actually perform when deployed.

4

Train XGBoost Multi-Class Classifier

XGBoost (Extreme Gradient Boosting) builds trees sequentially, where each new tree corrects the mistakes of the previous ones. It typically outperforms Random Forest on structured data, especially with class imbalance.

# Calculate class weights for imbalanced data class_counts = y_train.value_counts().sort_index() total = len(y_train) n_classes = len(class_counts) class_weights = {i: total / (n_classes * count) for i, count in class_counts.items()} # Assign sample weights sample_weights = y_train.map(class_weights) # Train XGBoost model = xgb.XGBClassifier( n_estimators=300, max_depth=6, learning_rate=0.1, subsample=0.8, colsample_bytree=0.8, objective="multi:softprob", num_class=n_classes, random_state=42, eval_metric="mlogloss", ) model.fit( X_train, y_train, sample_weight=sample_weights, eval_set=[(X_test, y_test)], verbose=50 ) print("Training complete.")

A note on eval_set: We pass (X_test, y_test) as the evaluation set so you can watch the model’s loss decrease during training. Be aware that this means the training procedure can “see” test set performance—if you add early_stopping_rounds, the model will use the test set to decide when to stop, which is a mild form of information leakage. In production pipelines, best practice is to create a separate validation set (e.g., a 2023-only holdout) for monitoring and early stopping, and reserve the true test set (2024–2025) for final evaluation only. For this tutorial, the impact is minimal since we are not tuning hyperparameters against the eval set, but it is an important distinction to understand.

XGBoost vs Random Forest: Random Forest builds trees independently (in parallel). XGBoost builds trees sequentially, with each tree focusing on the mistakes of the ensemble so far. The learning_rate controls how aggressively each tree corrects errors. Lower rates (0.01–0.1) with more trees generally give better results but take longer to train.

5

Evaluate Multi-Class Performance

# Predict on test set y_pred = model.predict(X_test) # Classification report with cause code names print(classification_report(y_test, y_pred, target_names=cause_names)) # Confusion matrix cm = confusion_matrix(y_test, y_pred) fig, ax = plt.subplots(figsize=(8, 7)) sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=cause_names, yticklabels=cause_names, ax=ax) ax.set_xlabel("Predicted Cause") ax.set_ylabel("Actual Cause") ax.set_title("Multi-Class Confusion Matrix: Outage Cause Prediction") plt.xticks(rotation=45, ha="right") plt.tight_layout() plt.show()

Look at the confusion matrix to understand where the model gets confused. Weather and vegetation outages are often the hardest to distinguish because storms can cause both tree-contact and direct wind/lightning damage.

6

Explain Predictions with SHAP

Feature importance tells you which features matter globally. SHAP (SHapley Additive exPlanations) goes further: it tells you how much each feature contributed to a specific prediction and in which direction.

# Create SHAP explainer explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X_test) # Summary plot: how each feature affects each class fig, axes = plt.subplots(1, len(cause_names), figsize=(20, 6)) for i, cause in enumerate(cause_names): plt.sca(axes[i]) shap.summary_plot(shap_values[i], X_test, feature_names=feature_cols, show=False, max_display=8) axes[i].set_title(f"{cause}") plt.tight_layout() plt.show()

Reading SHAP plots: Each dot is one prediction. The x-axis shows the SHAP value (positive = pushes toward this class, negative = pushes away). The color shows the feature value (red = high, blue = low). For example, if you see high wind_max values (red dots) pushed to the right for the "weather" class, it means high wind makes the model more confident the outage is weather-driven.

7

Explain Individual Predictions

SHAP's real power is explaining individual events. Pick a specific outage and see exactly why the model predicted its cause.

# Pick a specific outage event from the test set event_idx = 0 event = X_test.iloc[event_idx] actual = cause_names[y_test.iloc[event_idx]] predicted = cause_names[y_pred[event_idx]] print(f"Event details:") print(f" Actual cause: {actual}") print(f" Predicted cause: {predicted}") print(f" Wind max: {event['wind_max']:.1f} mph") print(f" Precip total: {event['precip_total']:.2f} in") print(f" Avg asset age: {event['avg_asset_age']:.0f} years") # Waterfall plot for this prediction predicted_class = y_pred[event_idx] shap.waterfall_plot( shap.Explanation( values=shap_values[predicted_class][event_idx], base_values=explainer.expected_value[predicted_class], data=event.values, feature_names=feature_cols ) )

The waterfall plot shows how the model built its prediction step by step. Each bar shows one feature's contribution. Red bars push toward the predicted class; blue bars push against it. The final value is the model's log-odds for that class.

8

Benchmark Against SAIFI Metrics

How does your model's predicted outage distribution compare to SP&L's actual reliability metrics? This bridges the gap between ML model accuracy and real utility KPIs.

# Load reliability metrics metrics = pd.read_csv(DATA_DIR + "outages/reliability_metrics.csv") print(metrics) # Compare predicted vs actual cause distribution for test years test_outages = df[test_mask] actual_dist = test_outages["cause_code"].value_counts(normalize=True) pred_causes = pd.Series(cause_names[y_pred]) pred_dist = pred_causes.value_counts(normalize=True) comparison = pd.DataFrame({ "Actual %": (actual_dist * 100).round(1), "Predicted %": (pred_dist * 100).round(1) }) print("\nCause distribution comparison (test set):") print(comparison) # Plot side-by-side fig, ax = plt.subplots(figsize=(10, 5)) x = np.arange(len(comparison)) width = 0.35 ax.bar(x - width/2, comparison["Actual %"], width, label="Actual", color="#2D6A7A") ax.bar(x + width/2, comparison["Predicted %"], width, label="Predicted", color="#5FCCDB") ax.set_xticks(x) ax.set_xticklabels(comparison.index, rotation=45, ha="right") ax.set_ylabel("Percentage of Outages") ax.set_title("Predicted vs Actual Outage Cause Distribution (2024-2025)") ax.legend() plt.tight_layout() plt.show()
9

Seasonal Cause Analysis

Outage causes vary by season. Vegetation peaks in spring/summer during the growing season. Weather outages spike during storm season. Equipment failures may increase in extreme heat. Let's validate the model captures these patterns.

# Predicted causes by month test_outages_with_pred = test_outages.copy() test_outages_with_pred["predicted_cause"] = cause_names[y_pred] monthly_causes = test_outages_with_pred.groupby( ["month", "predicted_cause"] ).size().unstack(fill_value=0) fig, ax = plt.subplots(figsize=(12, 6)) monthly_causes.plot(kind="bar", stacked=True, ax=ax, colormap="Set2") ax.set_xlabel("Month") ax.set_ylabel("Predicted Outage Count") ax.set_title("Predicted Outage Causes by Month") ax.legend(title="Cause", bbox_to_anchor=(1.05, 1)) plt.tight_layout() plt.show()

Reproducibility and Model Persistence

# For reproducible results, set random seeds at the top of your notebook np.random.seed(42) # Save the trained model for later use import joblib joblib.dump(model, "outage_cause_model.pkl") # Load it back model = joblib.load("outage_cause_model.pkl")

Why these hyperparameters? n_estimators=300, max_depth=6, learning_rate=0.1 are standard starting points for tree-based models. More trees (300 vs the default 100) compensate for the low learning rate, while max_depth=6 limits tree complexity to prevent overfitting on our modest dataset. Setting np.random.seed(42) at the top of your notebook ensures that data shuffling, train/test splits, and model initialization produce identical results every run.

What You Built and Next Steps

  1. Merged weather, asset, and conductor data into a rich feature set
  2. Built a multi-class XGBoost classifier that predicts outage cause codes
  3. Used time-aware splitting to honestly evaluate forward-looking performance
  4. Applied SHAP to explain both global patterns and individual predictions
  5. Benchmarked model predictions against SP&L's reliability metrics
  6. Analyzed seasonal variation in predicted outage causes

Ideas to Try Next

  • Temporal Convolutional Networks: Replace XGBoost with a TCN to capture sequence patterns in time-ordered outage data
  • Hyperparameter tuning: Use optuna or sklearn.model_selection.RandomizedSearchCV to optimize XGBoost parameters
  • Lightning data: Add lightning strike proximity from weather/lightning_strikes.csv as a feature
  • Spatial features: Add feeder topology features (length, number of taps, rural vs urban)
  • Probability calibration: Use CalibratedClassifierCV to ensure predicted probabilities reflect true likelihoods

Key Terms Glossary

  • Multi-class classification — predicting one of several categories (not just binary yes/no)
  • XGBoost — Extreme Gradient Boosting; builds trees sequentially to correct errors
  • SHAP values — SHapley Additive exPlanations; a game-theory approach to explain individual predictions
  • Time-aware split — using past data for training and future data for testing to simulate deployment
  • Class imbalance — when some categories have far fewer examples than others
  • SAIFI — System Average Interruption Frequency Index; average number of interruptions per customer per year
  • Feature engineering — creating informative input variables from raw data