For Power Systems Engineers

ML/AI Playground Guides

You already understand distribution system problems worth solving. AI-assisted development tools let you go from that understanding to a working model. These guides cover every use case on the SP&L dataset—from beginner fundamentals to advanced techniques like deep learning, reinforcement learning, and production deployment.

← Back to SP&L Overview

Before You Start

You already know how distribution systems work—that domain expertise is the hard part. Every guide below follows the same pattern: load the data, explore it, build features, train a model, and test your results. No prior ML experience required. If you get stuck on any step, an AI coding assistant can explain what the code does and help you adapt it to your own problems.

What You Need Installed

Each guide assumes you have the following tools ready. If you are brand new to Python, install Anaconda—it bundles Python, Jupyter, and most of the libraries listed below. Anaconda is available for Windows, macOS, and Linux.

Setup by Platform

Windows
  1. Download & install Anaconda for Windows
  2. Open Anaconda Prompt from the Start Menu
  3. Run pip install commands in that prompt
  4. Launch Jupyter with jupyter lab

Tip: You can also use PowerShell or Command Prompt if you add Python to your PATH during installation.

macOS / Linux
  1. Download & install Anaconda for macOS (or Linux)
  2. Open Terminal (Applications → Utilities on macOS)
  3. Run pip install commands (use pip3 if not using Anaconda)
  4. Launch Jupyter with jupyter lab

Tip: On macOS you may need pip3 instead of pip if you installed Python via Homebrew or python.org.

Required Libraries

Run this single command to install all core libraries at once (works on Windows, macOS, and Linux):

pip install jupyterlab pandas numpy matplotlib seaborn scikit-learn pyarrow
  • Python 3.9+ — the programming language used in every guide
  • Jupyter Notebook or JupyterLab — an interactive coding environment
  • pandas — for loading and manipulating data tables
  • numpy — for numerical operations
  • matplotlib & seaborn — for creating charts
  • scikit-learn — the core ML library used in most guides
  • pyarrow — for reading Parquet data files
  • SP&L Dataset — clone the Sisyphean Power & Light repository

Windows note on file paths: All code in these guides uses forward slashes in file paths (e.g., "sisyphean-power-and-light/outages/outage_events.csv"). Python on Windows handles forward slashes correctly, so you do not need to change them to backslashes. Just update the DATA_DIR variable to match where you cloned the repository.

Tip: Each guide also lists any additional libraries it needs at the top. Install them as you go.

AI-Assisted Development (Recommended)

These guides are designed to work alongside AI coding assistants. You don't need to memorize pandas syntax or know how scikit-learn's API works before you start—your domain expertise is the irreplaceable ingredient. AI tools bridge the ML knowledge gap in real time: ask them to explain a code block, debug an error, or adapt a model to a different dataset. The combination of your engineering judgment and AI-assisted coding is what makes this approach work.

Choose a Grid Problem to Solve

Pick the distribution engineering problem that matters to you. Each guide is self-contained—start with Guide 01 or jump straight to your area of expertise.

Guide 01

Outage Prediction

Difficulty: Beginner ~45 min

Train a Random Forest classifier to predict whether weather and asset conditions will cause an outage. Uses the 3,200+ outage events and weather data from SP&L.

Random Forest scikit-learn pandas
Start Guide →
Guide 02

Load Forecasting

Difficulty: Beginner ~50 min

Build a day-ahead load forecast using 5 years of hourly substation data. Start with a simple baseline, then train a gradient-boosted model that accounts for weather and seasonality.

Gradient Boosting scikit-learn Time Series
Start Guide →
Guide 03

Hosting Capacity Analysis

Difficulty: Beginner–Intermediate ~60 min

Run power flow simulations on the SP&L network to determine how much solar each feeder can handle. Learn to identify thermal and voltage violations using OpenDSS.

OpenDSS-py Power Flow pandas
Start Guide →
Guide 04

Predictive Asset Maintenance

Difficulty: Beginner ~45 min

Predict which transformers are most likely to fail using asset age, condition scores, loading history, and weather exposure. Build a risk-scoring model with XGBoost.

XGBoost Feature Engineering pandas
Start Guide →
Guide 05

FLISR & Restoration Optimization

Difficulty: Beginner–Intermediate ~55 min

Model the SP&L distribution network as a graph and simulate automated fault isolation and service restoration. Calculate how much customer downtime FLISR could have avoided.

NetworkX Graph Algorithms Simulation
Start Guide →
Guide 06

Volt-VAR Optimization

Difficulty: Intermediate ~60 min

Analyze voltage profiles across the SP&L network and build a rule-based Volt-VAR controller. Then introduce a simple reinforcement learning agent to learn optimal control policies.

OpenDSS-py Reinforcement Learning Control
Start Guide →
Guide 07

DER Scenario Planning

Difficulty: Beginner–Intermediate ~50 min

Stress-test the grid against high-solar and high-EV futures. Use Monte Carlo simulation to model uncertain adoption rates and identify which feeders hit capacity limits first.

Monte Carlo Scenario Analysis numpy
Start Guide →
Guide 08

Anomaly Detection & Grid State Estimation

Difficulty: Beginner–Intermediate ~50 min

Build an anomaly detector for AMI voltage data using Isolation Forest. Then construct a simple autoencoder in PyTorch to flag unusual grid behavior in real time.

Isolation Forest Autoencoder PyTorch
Start Guide →

Advanced Guides

These cover techniques that used to require a dedicated data science team—deep learning, reinforcement learning, survival analysis, and production deployment patterns. With AI-assisted development, an experienced power engineer can build and understand these models. Each guide builds on its beginner counterpart.

Guide 09 — Advanced

Advanced Outage Prediction

Difficulty: Intermediate–Advanced ~75 min

Upgrade from binary classification to multi-class outage cause prediction using XGBoost. Add asset features, implement time-aware validation, and use SHAP to explain individual predictions.

XGBoost SHAP Multi-Class
Start Guide →
Guide 10 — Advanced

Advanced Load Forecasting

Difficulty: Intermediate–Advanced ~80 min

Build an LSTM neural network in PyTorch for multi-step ahead load forecasting. Learn sequence modeling, sliding window preparation, and compare deep learning to gradient boosting baselines.

LSTM PyTorch Multi-Step
Start Guide →
Guide 11 — Advanced

ML-Accelerated Hosting Capacity

Difficulty: Intermediate–Advanced ~70 min

Train a surrogate ML model to predict hosting capacity without running full power flow simulations. Achieve 100x+ speedup with LightGBM, quantile regression for uncertainty, and spatial mapping.

LightGBM Surrogate Model Quantile Regression
Start Guide →
Guide 12 — Advanced

Survival Analysis for Asset Management

Difficulty: Intermediate–Advanced ~70 min

Move beyond "will it fail?" to "when will it fail?" using survival analysis. Build Kaplan-Meier curves, Cox Proportional Hazards models, and risk-prioritized replacement schedules.

Survival Analysis lifelines Cox PH
Start Guide →
Guide 13 — Advanced

RL-Based Service Restoration

Difficulty: Advanced ~80 min

Apply reinforcement learning to optimize post-fault switching sequences. Simulate microgrid islanding during emergencies and model cold load pickup for realistic restoration planning.

Q-Learning NetworkX Microgrid
Start Guide →
Guide 14 — Advanced

Deep RL for Volt-VAR Control

Difficulty: Advanced ~85 min

Scale from tabular Q-learning to Deep Q-Networks (DQN) with neural networks. Control multiple devices simultaneously with experience replay, target networks, and multi-objective rewards.

DQN PyTorch Multi-Device
Start Guide →
Guide 15 — Advanced

Stochastic Grid Upgrade Planning

Difficulty: Intermediate–Advanced ~75 min

Optimize grid upgrade investments under uncertain DER adoption. Build cost-benefit models, evaluate non-wires alternatives, and create a 5-year investment roadmap with stochastic programming.

Optimization scipy Cost-Benefit
Start Guide →
Guide 16 — Advanced

VAE & Streaming Anomaly Detection

Difficulty: Advanced ~80 min

Replace basic autoencoders with Variational Autoencoders for probabilistic anomaly scoring. Implement a real-time sliding window detection pipeline with adaptive thresholds.

VAE PyTorch Streaming
Start Guide →

Ready to Build?

The SP&L dataset, these guides, and an AI coding assistant—that's the fastest path from understanding a distribution system problem to having a working model. Request early access to clone the repository and start building.

Get Early Access