Prerequisite: Complete Guide 01: Outage Prediction first. This guide extends the binary classifier into a multi-class model that predicts the cause of an outage and explains why.
What You Will Learn
In Guide 01, you built a binary classifier: outage or no outage. But utility reliability teams need more—they need to know why an outage is likely. Is it vegetation encroachment during the growing season? Weather-driven during storm season? Equipment failure on aging infrastructure? And what about the significant share of outages with undetermined causes? In this guide you will:
- Build a multi-class classifier that predicts outage cause codes (vegetation, weather, equipment, animal, overload, and unknown)
- Use XGBoost instead of Random Forest for improved performance on imbalanced classes
- Implement a time-aware train/test split that simulates real-world deployment
- Add asset features (transformer age, condition scores) alongside weather
- Use SHAP values to explain individual predictions—not just global feature importance
- Benchmark your model against SP&L's annual SAIFI reliability metrics
SP&L Data You Will Use
- outages/outage_events.csv — 3,200+ outage records with cause codes
- weather/hourly_observations.csv — hourly weather data
- assets/transformers.csv — transformer age, condition scores, kVA ratings
- assets/conductors.csv — conductor age and vegetation clearance zones
- outages/reliability_metrics.csv — annual SAIFI/SAIDI for benchmarking
Additional Libraries
Load and Merge All Data Sources
Unlike Guide 01 where we used only weather data, here we merge weather, asset, and conductor data to give the model a richer picture of outage drivers.
Cause codes: ['vegetation' 'weather' 'equipment_failure' 'animal_contact' 'overload' 'unknown']
Cause code distribution:
vegetation 987
weather 842
equipment_failure 614
animal_contact 389
overload 247
unknown 168
Why multi-class? A binary model tells you "an outage might happen." A multi-class model tells you "an outage might happen due to vegetation." This lets crews prepare the right response: tree trimming crews for vegetation, line patrol for equipment, or storm staging for weather-driven events.
Build Enriched Feature Set
We combine daily weather summaries with asset condition data for each outage's feeder. This gives the model both environmental and infrastructure context. Note that we keep all cause codes including “unknown”—in real utility data, a significant portion of outages have undetermined causes, and the model should learn to recognize these patterns rather than ignoring them.
Time-Aware Train/Test Split
In Guide 01, we used a random split. But in production, your model always predicts the future based on the past. A time-aware split is more honest: train on 2020–2023 data, test on 2024–2025.
Why not random split? Random splitting lets future data "leak" into the training set. If your model trains on a July 2024 storm and tests on a June 2024 event, it has an unfair advantage. Time-aware splitting gives you honest performance estimates that reflect how the model will actually perform when deployed.
Train XGBoost Multi-Class Classifier
XGBoost (Extreme Gradient Boosting) builds trees sequentially, where each new tree corrects the mistakes of the previous ones. It typically outperforms Random Forest on structured data, especially with class imbalance.
A note on eval_set: We pass (X_test, y_test) as the evaluation set so you can watch the model’s loss decrease during training. Be aware that this means the training procedure can “see” test set performance—if you add early_stopping_rounds, the model will use the test set to decide when to stop, which is a mild form of information leakage. In production pipelines, best practice is to create a separate validation set (e.g., a 2023-only holdout) for monitoring and early stopping, and reserve the true test set (2024–2025) for final evaluation only. For this tutorial, the impact is minimal since we are not tuning hyperparameters against the eval set, but it is an important distinction to understand.
XGBoost vs Random Forest: Random Forest builds trees independently (in parallel). XGBoost builds trees sequentially, with each tree focusing on the mistakes of the ensemble so far. The learning_rate controls how aggressively each tree corrects errors. Lower rates (0.01–0.1) with more trees generally give better results but take longer to train.
Evaluate Multi-Class Performance
Look at the confusion matrix to understand where the model gets confused. Weather and vegetation outages are often the hardest to distinguish because storms can cause both tree-contact and direct wind/lightning damage.
Explain Predictions with SHAP
Feature importance tells you which features matter globally. SHAP (SHapley Additive exPlanations) goes further: it tells you how much each feature contributed to a specific prediction and in which direction.
Reading SHAP plots: Each dot is one prediction. The x-axis shows the SHAP value (positive = pushes toward this class, negative = pushes away). The color shows the feature value (red = high, blue = low). For example, if you see high wind_max values (red dots) pushed to the right for the "weather" class, it means high wind makes the model more confident the outage is weather-driven.
Explain Individual Predictions
SHAP's real power is explaining individual events. Pick a specific outage and see exactly why the model predicted its cause.
The waterfall plot shows how the model built its prediction step by step. Each bar shows one feature's contribution. Red bars push toward the predicted class; blue bars push against it. The final value is the model's log-odds for that class.
Benchmark Against SAIFI Metrics
How does your model's predicted outage distribution compare to SP&L's actual reliability metrics? This bridges the gap between ML model accuracy and real utility KPIs.
Seasonal Cause Analysis
Outage causes vary by season. Vegetation peaks in spring/summer during the growing season. Weather outages spike during storm season. Equipment failures may increase in extreme heat. Let's validate the model captures these patterns.
Reproducibility and Model Persistence
Why these hyperparameters? n_estimators=300, max_depth=6, learning_rate=0.1 are standard starting points for tree-based models. More trees (300 vs the default 100) compensate for the low learning rate, while max_depth=6 limits tree complexity to prevent overfitting on our modest dataset. Setting np.random.seed(42) at the top of your notebook ensures that data shuffling, train/test splits, and model initialization produce identical results every run.
What You Built and Next Steps
- Merged weather, asset, and conductor data into a rich feature set
- Built a multi-class XGBoost classifier that predicts outage cause codes
- Used time-aware splitting to honestly evaluate forward-looking performance
- Applied SHAP to explain both global patterns and individual predictions
- Benchmarked model predictions against SP&L's reliability metrics
- Analyzed seasonal variation in predicted outage causes
Ideas to Try Next
- Temporal Convolutional Networks: Replace XGBoost with a TCN to capture sequence patterns in time-ordered outage data
- Hyperparameter tuning: Use
optunaorsklearn.model_selection.RandomizedSearchCVto optimize XGBoost parameters - Lightning data: Add lightning strike proximity from
weather/lightning_strikes.csvas a feature - Spatial features: Add feeder topology features (length, number of taps, rural vs urban)
- Probability calibration: Use
CalibratedClassifierCVto ensure predicted probabilities reflect true likelihoods
Key Terms Glossary
- Multi-class classification — predicting one of several categories (not just binary yes/no)
- XGBoost — Extreme Gradient Boosting; builds trees sequentially to correct errors
- SHAP values — SHapley Additive exPlanations; a game-theory approach to explain individual predictions
- Time-aware split — using past data for training and future data for testing to simulate deployment
- Class imbalance — when some categories have far fewer examples than others
- SAIFI — System Average Interruption Frequency Index; average number of interruptions per customer per year
- Feature engineering — creating informative input variables from raw data