You already understand distribution system problems worth solving. AI-assisted development tools let you go from that understanding to a working model. These guides cover every use case on the SP&L dataset—from beginner fundamentals to advanced techniques like deep learning, reinforcement learning, and production deployment.
← Back to SP&L OverviewYou already know how distribution systems work—that domain expertise is the hard part. Every guide below follows the same pattern: load the data, explore it, build features, train a model, and test your results. No prior ML experience required. If you get stuck on any step, an AI coding assistant can explain what the code does and help you adapt it to your own problems.
Each guide assumes you have the following tools ready. If you are brand new to Python, install Anaconda—it bundles Python, Jupyter, and most of the libraries listed below. Anaconda is available for Windows, macOS, and Linux.
pip install commands in that promptjupyter labTip: You can also use PowerShell or Command Prompt if you add Python to your PATH during installation.
pip install commands (use pip3 if not using Anaconda)jupyter labTip: On macOS you may need pip3 instead of pip if you installed Python via Homebrew or python.org.
Run this single command to install all core libraries at once (works on Windows, macOS, and Linux):
Windows note on file paths: All code in these guides uses forward slashes in file paths (e.g., "sisyphean-power-and-light/outages/outage_events.csv"). Python on Windows handles forward slashes correctly, so you do not need to change them to backslashes. Just update the DATA_DIR variable to match where you cloned the repository.
Tip: Each guide also lists any additional libraries it needs at the top. Install them as you go.
These guides are designed to work alongside AI coding assistants. You don't need to memorize pandas syntax or know how scikit-learn's API works before you start—your domain expertise is the irreplaceable ingredient. AI tools bridge the ML knowledge gap in real time: ask them to explain a code block, debug an error, or adapt a model to a different dataset. The combination of your engineering judgment and AI-assisted coding is what makes this approach work.
Every guide includes an "Open in Colab" badge at the top of the page. Google Colab is a free, cloud-based Jupyter environment—no local Python install needed. The SP&L dataset loads automatically from GitHub.
Tip: Colab sessions disconnect after ~90 minutes of inactivity. If your session expires, just re-open the notebook and run the cells again—it only takes a few seconds to restart.
Pick the distribution engineering problem that matters to you. Each guide is self-contained—start with Guide 01 or jump straight to your area of expertise.
Train a Random Forest classifier to predict whether weather and asset conditions will cause an outage. Uses the 3,200+ outage events and weather data from SP&L.
Build a day-ahead load forecast using 5 years of hourly substation data. Start with a simple baseline, then train a gradient-boosted model that accounts for weather and seasonality.
Run power flow simulations on the SP&L network to determine how much solar each feeder can handle. Learn to identify thermal and voltage violations using OpenDSS.
Predict which transformers are most likely to fail using asset age, condition scores, loading history, and weather exposure. Build a risk-scoring model with XGBoost.
Model the SP&L distribution network as a graph and simulate automated fault isolation and service restoration. Calculate how much customer downtime FLISR could have avoided.
Analyze voltage profiles across the SP&L network and build a rule-based Volt-VAR controller. Then introduce a simple reinforcement learning agent to learn optimal control policies.
Stress-test the grid against high-solar and high-EV futures. Use Monte Carlo simulation to model uncertain adoption rates and identify which feeders hit capacity limits first.
Build an anomaly detector for AMI voltage data using Isolation Forest. Then construct a simple autoencoder in PyTorch to flag unusual grid behavior in real time.
These cover techniques that used to require a dedicated data science team—deep learning, reinforcement learning, survival analysis, and production deployment patterns. With AI-assisted development, an experienced power engineer can build and understand these models. Each guide builds on its beginner counterpart.
Upgrade from binary classification to multi-class outage cause prediction using XGBoost. Add asset features, implement time-aware validation, and use SHAP to explain individual predictions.
Build an LSTM neural network in PyTorch for multi-step ahead load forecasting. Learn sequence modeling, sliding window preparation, and compare deep learning to gradient boosting baselines.
Train a surrogate ML model to predict hosting capacity without running full power flow simulations. Achieve 100x+ speedup with LightGBM, quantile regression for uncertainty, and spatial mapping.
Move beyond "will it fail?" to "when will it fail?" using survival analysis. Build Kaplan-Meier curves, Cox Proportional Hazards models, and risk-prioritized replacement schedules.
Apply reinforcement learning to optimize post-fault switching sequences. Simulate microgrid islanding during emergencies and model cold load pickup for realistic restoration planning.
Scale from tabular Q-learning to Deep Q-Networks (DQN) with neural networks. Control multiple devices simultaneously with experience replay, target networks, and multi-objective rewards.
Optimize grid upgrade investments under uncertain DER adoption. Build cost-benefit models, evaluate non-wires alternatives, and create a 5-year investment roadmap with stochastic programming.
Replace basic autoencoders with Variational Autoencoders for probabilistic anomaly scoring. Implement a real-time sliding window detection pipeline with adaptive thresholds.
Move from the distribution grid to the power plant. These guides use time-series data from a motor-driven Boiler Feed Pump train at SP&L's 300 MW combined-cycle generating station—527,000 minutes of vibration, temperature, pressure, and flow data with embedded fault signatures.
Monitor bearing temperatures and shaft vibration trends on a motor-driven boiler feed pump. Calculate rolling statistics, set threshold alerts, and detect anomalies with Isolation Forest.
Predict BFP feedwater flow, motor power, and discharge pressure from unit megawatt output. Use residual analysis to detect efficiency drift that signals developing equipment faults.
Fuse vibration, temperature, and seal data to classify fault types: bearing wear, seal degradation, and coupling misalignment. Use SHAP to explain which sensors drive each diagnosis.
Build a physics-informed model from OEM pump curves and healthy baseline data. Track actual-vs-expected performance to detect efficiency decay and create a composite pump health index.
The SP&L dataset, these guides, and an AI coding assistant—that's the fastest path from understanding a distribution system problem to having a working model. Request early access to clone the repository and start building.
Get Early Access