What You Will Learn
Utilities spend billions of dollars maintaining and replacing aging infrastructure. Instead of replacing equipment on a fixed schedule (time-based maintenance), predictive maintenance uses data to identify which assets are most likely to fail soon—so crews can prioritize the right work. In this guide you will:
- Load transformer, maintenance, and outage data from the SP&L dataset
- Engineer features from asset age, condition scores, loading history, and failure records
- Train an XGBoost classifier to predict transformer failure risk
- Evaluate your model and generate a risk-ranked asset list
- Visualize which factors contribute most to failure risk
What is XGBoost? XGBoost (eXtreme Gradient Boosting) is an optimized version of Gradient Boosting that trains faster and often produces more accurate results. It is one of the most widely used ML algorithms in industry and dominates tabular data competitions on Kaggle.
SP&L Data You Will Use
- assets/transformers.csv — 86 transformers with kVA rating, installation year, manufacturer, type, and health index (1–5)
- assets/maintenance_log.csv — inspection dates, work orders, and replacement records
- outages/outage_events.csv — historical outage events linked to equipment failures
- weather/hourly_observations.csv — weather exposure data for environmental stress analysis
Additional Libraries
Which terminal should I use? On Windows, open Anaconda Prompt from the Start Menu (or PowerShell / Command Prompt if Python is already in your PATH). On macOS, open Terminal from Applications → Utilities. On Linux, open your default terminal. All pip install commands work the same across platforms.
Load the Data
Explore the Transformer Data
Create the Failure Target
We need to label each transformer: has it experienced an equipment-failure outage? We'll use the outage event log to identify transformers linked to "equipment_failure" cause codes.
Engineer Maintenance Features
Maintenance history tells us a lot about asset health. Transformers with many work orders, or long gaps since the last inspection, may be at higher risk.
Why fill missing values with 9999? If a transformer has no inspection record, it means it hasn't been inspected recently (or ever). Using a large number for days_since_inspection encodes this "never inspected" state as high risk, which makes intuitive sense.
Prepare Features and Split
Train the XGBoost Model
Test and Evaluate
What is AUC-ROC? AUC (Area Under the ROC Curve) measures how well the model distinguishes between positive and negative classes across all probability thresholds. A score of 1.0 is perfect, 0.5 is random guessing. For maintenance prioritization, anything above 0.7 is useful because you don't need perfect accuracy—you just need to rank assets by risk.
Plot the ROC Curve
Generate a Risk-Ranked Asset List
The real value of this model is not just accuracy—it's the ability to produce a prioritized list of assets that maintenance crews can act on.
Feature Importance
You will typically see age_years and health_index at the top. Older transformers with poor health scores are the highest-risk assets—which aligns with engineering intuition and validates the model.
What You Built and Next Steps
- Loaded transformer, maintenance, and outage data from the SP&L repository
- Created a binary failure target from equipment-failure outage records
- Engineered features from asset age, condition, maintenance history
- Trained an XGBoost classifier with class-imbalance handling
- Evaluated performance with classification report and ROC curve
- Generated a risk-ranked asset list for maintenance prioritization
Ideas to Try Next
- Add weather exposure: Calculate cumulative storm exposure per transformer from the weather data
- Survival analysis: Use the
lifelineslibrary to model time-to-failure instead of binary failure - Include loading history: Use peak loading percentages from feeder load data to measure stress over time
- Extend to poles and conductors: Apply the same approach to
assets/poles.csvandassets/conductors.csv - Cost-benefit analysis: Combine failure probability with replacement cost and outage impact to optimize capital spending
Key Terms Glossary
- XGBoost — an optimized gradient boosting library for high-performance ML
- Predictive maintenance — using data to predict failures before they occur, replacing time-based schedules
- Health index — a composite score (typically 1–5) representing overall asset condition
- AUC-ROC — measures how well the model distinguishes between classes; 1.0 = perfect, 0.5 = random
- Class imbalance — when one category (e.g., "no failure") is much more common than the other
- scale_pos_weight — XGBoost parameter that compensates for class imbalance
- Risk score — the model's predicted probability of failure, used to rank assets
Ready to Level Up?
In the advanced guide, you'll use survival analysis to predict when transformers will fail and build risk-prioritized replacement schedules.
Go to Advanced Predictive Maintenance →