ML/AI Playground Beginner Guides - Sisyphean Power & Light

Before You Start

You already know how distribution systems work—that domain expertise is the hard part. Every guide below follows the same pattern: load the data, explore it, build features, train a model, and test your results. No prior ML experience required. If you get stuck on any step, an AI coding assistant can explain what the code does and help you adapt it to your own problems.

What You Need Installed

Each guide assumes you have the following tools ready. If you are brand new to Python, install Anaconda—it bundles Python, Jupyter, and most of the libraries listed below. Anaconda is available for Windows, macOS, and Linux.

Setup by Platform

Windows

Download & install Anaconda for Windows
Open Anaconda Prompt from the Start Menu
Run pip install commands in that prompt
Launch Jupyter with jupyter lab

Tip: You can also use PowerShell or Command Prompt if you add Python to your PATH during installation.

macOS / Linux

Download & install Anaconda for macOS (or Linux)
Open Terminal (Applications → Utilities on macOS)
Run pip install commands (use pip3 if not using Anaconda)
Launch Jupyter with jupyter lab

Tip: On macOS you may need pip3 instead of pip if you installed Python via Homebrew or python.org.

Required Libraries

Run this single command to install all core libraries at once (works on Windows, macOS, and Linux):

pip install jupyterlab pandas numpy matplotlib seaborn scikit-learn pyarrow

Python 3.9+ — the programming language used in every guide
Jupyter Notebook or JupyterLab — an interactive coding environment
pandas — for loading and manipulating data tables
numpy — for numerical operations
matplotlib & seaborn — for creating charts
scikit-learn — the core ML library used in most guides
pyarrow — for reading Parquet data files
SP&L Dataset — clone the Sisyphean Power & Light repository

Windows note on file paths: All code in these guides uses forward slashes in file paths (e.g., "sisyphean-power-and-light/outages/outage_events.csv"). Python on Windows handles forward slashes correctly, so you do not need to change them to backslashes. Just update the DATA_DIR variable to match where you cloned the repository.

Tip: Each guide also lists any additional libraries it needs at the top. Install them as you go.

AI-Assisted Development (Recommended)

These guides are designed to work alongside AI coding assistants. You don't need to memorize pandas syntax or know how scikit-learn's API works before you start—your domain expertise is the irreplaceable ingredient. AI tools bridge the ML knowledge gap in real time: ask them to explain a code block, debug an error, or adapt a model to a different dataset. The combination of your engineering judgment and AI-assisted coding is what makes this approach work.

No Setup Required? Use Google Colab

Every guide includes an "Open in Colab" badge at the top of the page. Google Colab is a free, cloud-based Jupyter environment—no local Python install needed. The SP&L dataset loads automatically from GitHub.

How to Use It

Click the badge — Look for the button near the top of any guide page. It opens the guide as a runnable Jupyter notebook in your browser.
Sign in with Google — Any free Google account works. Colab will ask you to sign in if you are not already.
Run the code — Click Runtime → Run all to execute every cell at once, or click each cell individually and press Shift + Enter to step through the guide at your own pace.

Tip: Colab sessions disconnect after ~90 minutes of inactivity. If your session expires, just re-open the notebook and run the cells again—it only takes a few seconds to restart.

Choose a Grid Problem to Solve

Pick the distribution engineering problem that matters to you. Each guide is self-contained—start with Guide 01 or jump straight to your area of expertise.

Guide 01

Outage Prediction

Difficulty: Beginner ~45 min

Train a Random Forest classifier to predict whether weather and asset conditions will cause an outage. Uses the 3,200+ outage events and weather data from SP&L.

Random Forest scikit-learn pandas

Start Guide →

Guide 02

Load Forecasting

Difficulty: Beginner ~50 min

Build a day-ahead load forecast using 5 years of hourly substation data. Start with a simple baseline, then train a gradient-boosted model that accounts for weather and seasonality.

Gradient Boosting scikit-learn Time Series

Start Guide →

Guide 03

Hosting Capacity Analysis

Difficulty: Beginner–Intermediate ~60 min

Run power flow simulations on the SP&L network to determine how much solar each feeder can handle. Learn to identify thermal and voltage violations using OpenDSS.

OpenDSS-py Power Flow pandas

Start Guide →

Guide 04

Predictive Asset Maintenance

Difficulty: Beginner ~45 min

Predict which transformers are most likely to fail using asset age, condition scores, loading history, and weather exposure. Build a risk-scoring model with XGBoost.

XGBoost Feature Engineering pandas

Start Guide →

Guide 05

FLISR & Restoration Optimization

Difficulty: Beginner–Intermediate ~55 min

Model the SP&L distribution network as a graph and simulate automated fault isolation and service restoration. Calculate how much customer downtime FLISR could have avoided.

NetworkX Graph Algorithms Simulation

Start Guide →

Guide 06

Volt-VAR Optimization

Difficulty: Intermediate ~60 min

Analyze voltage profiles across the SP&L network and build a rule-based Volt-VAR controller. Then introduce a simple reinforcement learning agent to learn optimal control policies.

OpenDSS-py Reinforcement Learning Control

Start Guide →

Guide 07

DER Scenario Planning

Difficulty: Beginner–Intermediate ~50 min

Stress-test the grid against high-solar and high-EV futures. Use Monte Carlo simulation to model uncertain adoption rates and identify which feeders hit capacity limits first.

Monte Carlo Scenario Analysis numpy

Start Guide →

Guide 08

Anomaly Detection & Grid State Estimation

Difficulty: Beginner–Intermediate ~50 min

Build an anomaly detector for AMI voltage data using Isolation Forest. Then construct a simple autoencoder in PyTorch to flag unusual grid behavior in real time.

Isolation Forest Autoencoder PyTorch

Start Guide →

Advanced Guides

These cover techniques that used to require a dedicated data science team—deep learning, reinforcement learning, survival analysis, and production deployment patterns. With AI-assisted development, an experienced power engineer can build and understand these models. Each guide builds on its beginner counterpart.

Guide 09 — Advanced

Advanced Outage Prediction

Difficulty: Intermediate–Advanced ~75 min

Upgrade from binary classification to multi-class outage cause prediction using XGBoost. Add asset features, implement time-aware validation, and use SHAP to explain individual predictions.

XGBoost SHAP Multi-Class

Start Guide →

Guide 10 — Advanced

Advanced Load Forecasting

Difficulty: Intermediate–Advanced ~80 min

Build an LSTM neural network in PyTorch for multi-step ahead load forecasting. Learn sequence modeling, sliding window preparation, and compare deep learning to gradient boosting baselines.

LSTM PyTorch Multi-Step

Start Guide →

Guide 11 — Advanced

ML-Accelerated Hosting Capacity

Difficulty: Intermediate–Advanced ~70 min

Train a surrogate ML model to predict hosting capacity without running full power flow simulations. Achieve 100x+ speedup with LightGBM, quantile regression for uncertainty, and spatial mapping.

LightGBM Surrogate Model Quantile Regression

Start Guide →

Guide 12 — Advanced

Survival Analysis for Asset Management

Difficulty: Intermediate–Advanced ~70 min

Move beyond "will it fail?" to "when will it fail?" using survival analysis. Build Kaplan-Meier curves, Cox Proportional Hazards models, and risk-prioritized replacement schedules.

Survival Analysis lifelines Cox PH

Start Guide →

Guide 13 — Advanced

RL-Based Service Restoration

Difficulty: Advanced ~80 min

Apply reinforcement learning to optimize post-fault switching sequences. Simulate microgrid islanding during emergencies and model cold load pickup for realistic restoration planning.

Q-Learning NetworkX Microgrid

Start Guide →

Guide 14 — Advanced

Deep RL for Volt-VAR Control

Difficulty: Advanced ~85 min

Scale from tabular Q-learning to Deep Q-Networks (DQN) with neural networks. Control multiple devices simultaneously with experience replay, target networks, and multi-objective rewards.

DQN PyTorch Multi-Device

Start Guide →

Guide 15 — Advanced

Stochastic Grid Upgrade Planning

Difficulty: Intermediate–Advanced ~75 min

Optimize grid upgrade investments under uncertain DER adoption. Build cost-benefit models, evaluate non-wires alternatives, and create a 5-year investment roadmap with stochastic programming.

Optimization scipy Cost-Benefit

Start Guide →

Guide 16 — Advanced

VAE & Streaming Anomaly Detection

Difficulty: Advanced ~80 min

Replace basic autoencoders with Variational Autoencoders for probabilistic anomaly scoring. Implement a real-time sliding window detection pipeline with adaptive thresholds.

VAE PyTorch Streaming

Start Guide →

Generation Systems

Move from the distribution grid to the power plant. These guides use time-series data from a motor-driven Boiler Feed Pump train at SP&L's 300 MW combined-cycle generating station—527,000 minutes of vibration, temperature, pressure, and flow data with embedded fault signatures.

Guide 17

BFP Health Monitoring

Difficulty: Beginner ~45 min

Monitor bearing temperatures and shaft vibration trends on a motor-driven boiler feed pump. Calculate rolling statistics, set threshold alerts, and detect anomalies with Isolation Forest.

Isolation Forest Rolling Statistics pandas

Start Guide →

Guide 18

Feedwater System Load Correlation

Difficulty: Beginner ~45 min

Predict BFP feedwater flow, motor power, and discharge pressure from unit megawatt output. Use residual analysis to detect efficiency drift that signals developing equipment faults.

Linear Regression scikit-learn Correlation

Start Guide →

Guide 19 — Advanced

Rotating Equipment Fault Diagnosis

Difficulty: Advanced ~70 min

Fuse vibration, temperature, and seal data to classify fault types: bearing wear, seal degradation, and coupling misalignment. Use SHAP to explain which sensors drive each diagnosis.

Multi-class Classification SHAP Random Forest

Start Guide →

Guide 20 — Advanced

BFP Digital Twin & Performance Tracking

Difficulty: Advanced ~75 min

Build a physics-informed model from OEM pump curves and healthy baseline data. Track actual-vs-expected performance to detect efficiency decay and create a composite pump health index.

Physics-Informed ML Pump Curves Residual Analysis

Start Guide →

ML/AI Playground Guides

Before You Start

What You Need Installed

Setup by Platform

Required Libraries

AI-Assisted Development (Recommended)

No Setup Required? Use Google Colab

How to Use It

Choose a Grid Problem to Solve

Outage Prediction

Load Forecasting

Hosting Capacity Analysis

Predictive Asset Maintenance

FLISR & Restoration Optimization

Volt-VAR Optimization

DER Scenario Planning

Anomaly Detection & Grid State Estimation

Advanced Guides

Advanced Outage Prediction

Advanced Load Forecasting

ML-Accelerated Hosting Capacity

Survival Analysis for Asset Management

RL-Based Service Restoration

Deep RL for Volt-VAR Control

Stochastic Grid Upgrade Planning

VAE & Streaming Anomaly Detection

Generation Systems

BFP Health Monitoring

Feedwater System Load Correlation

Rotating Equipment Fault Diagnosis

BFP Digital Twin & Performance Tracking

Ready to Build?