SP&L: Synthetic Utility Dataset for ML & Planning

A fictional utility. Real engineering problems.

SP&L is a fictional municipal utility serving approximately 238,000 customers across 23 substations in a mixed urban, suburban, and rural service territory. Powered in part by a 300 MW 2×1 combined-cycle generating station. The Dynamic Network Model is a fully-realized open-source representation of SP&L's distribution system and generation assets, complete with time-series load data, DER installations, historical outage records, asset metadata, weather correlations, protection device configurations, and plant-side rotating equipment instrumentation.

§ 02

What's inside the repository.

Organized into logical layers from raw topology to pre-built analysis notebooks. Each layer is documented, versioned, and usable independently.

Layer 01

Network topology & assets

OpenDSS-native circuit definitions for all 104 feeders. Substation configurations, transformer ratings, line impedances, switch and protection device locations. The raw electrical infrastructure, in a format every distribution engineer can run.

Layer 02

Time-series operational data

Five years of hourly customer load, DER generation profiles, weather correlations, and feeder-level voltage/current measurements. Realistic seasonality, weather sensitivity, growth trends, and the day-to-day noise that real grid data carries.

Layer 03

Reliability & outage records

3,200+ historical outage events with root cause classification (equipment, vegetation, weather, animal, vehicle, unknown), customer impact, restoration time, and crew dispatch records. Computed SAIDI, SAIFI, and CAIDI by feeder, by year, and by major-event-day classification.

Layer 04

DER & interconnection registry

Solar PV, battery storage, and EV charging installations with location, size, vintage, and inverter characteristics. Hosting capacity baselines per circuit and the queue of pending interconnections that would push some circuits over the edge.

Layer 05

Generation plant instrumentation

Plant-side data from the 300 MW 2×1 combined-cycle station. Boiler feed pump vibration histories, condenser performance, heat-rate trends, forced-outage records, valve cycle counts. Useful for digital-twin work and rotating-equipment ML.

Layer 06

Pre-built notebooks

Twenty-three Jupyter-style guides walk through specific ML use cases against this data: outage prediction, load forecasting, hosting capacity, predictive maintenance, FLISR optimization, volt-var, DER scenarios, anomaly detection, rotating-equipment health, and OpenDSS power-flow integration.

§ 03

The ML/AI playground.

Two things have kept power engineers from building their own ML tools: data locked behind NDAs and CEII restrictions, and the gap between understanding a grid problem and implementing an ML solution. SP&L removes the first barrier. AI-assisted dev tools remove the second.

Use case 01

Outage prediction

Predict which feeders or asset classes are most likely to experience an outage in the next 24-72 hours. Features: weather, vegetation, age, recent fault rates, load patterns.

Use case 02

Load forecasting

Hourly and day-ahead load forecasts at the feeder level. Train against five years of weather-correlated load history; validate on the last twelve months.

Use case 03

Hosting capacity analysis

Compute hosting capacity for solar interconnection per feeder using OpenDSS. Identify circuits with headroom; identify circuits where the next interconnection triggers a constraint.

Use case 04

Predictive asset maintenance

Failure-probability models for transformers and switchgear using age, loading, fault history, and weather exposure. Output: prioritized inspection list.

Use case 05

FLISR & restoration optimization

Optimize switching schemes for fault isolation and service restoration. Compare automated FLISR sequences against the actual outage history.

Use case 06

Volt-VAR optimization

Conservation voltage reduction and reactive power dispatch using time-series load and DER data. Quantify the kW savings from a CVR strategy on each feeder.

Use case 07

DER scenario planning

Run scenarios across DER adoption trajectories, EV penetration rates, and storage deployments. Identify which feeders need infrastructure investment first.

Use case 08

Anomaly detection & state estimation

Detect anomalous voltage, current, or load patterns that indicate equipment degradation, theft, or measurement errors. Reconstruct grid state from sparse measurements.

Use case 09

Rotating equipment health monitoring

Vibration-based ML for boiler feed pumps and turbines. Train on the plant instrumentation data; deploy as an alerting layer over a real plant historian.

Use case 10

Plant performance & digital twins

Heat-rate forecasting, condenser performance models, combined-cycle dispatch optimization. The plant data layer is built for digital-twin work.

Sisyphean Power & Light.