New: Generation Systems dataset now available — 12 months of BFP train time-series data from SP&L's 300 MW combined-cycle plant, with 4 new ML guides. Explore Guides 17–20 →
Open-Source Dynamic Network Model

Sisyphean Power & Light

A realistic, synthetic utility—distribution grid and generation plant—you can clone, query, model, and break. From feeders and transformers to boiler feed pumps and vibration data. Experienced power engineers have always understood the problems worth solving—now, with realistic data and AI-assisted development tools, you can build the ML applications that used to require a dedicated data science team.

Explore the Model See Use Cases
Sisyphean Power & Light Logo

A Fictional Utility. Real Engineering Problems.

Sisyphean Power & Light (SP&L) is a fictional municipal utility serving approximately 48,000 customers across a mixed urban, suburban, and rural service territory, powered in part by a 300 MW 2x1 combined-cycle generating station. The Dynamic Network Model is a fully-realized, open-source representation of SP&L's distribution system and generation assets—complete with time-series load data, DER installations, historical outage records, asset metadata, weather correlations, protection device configurations, and plant-side rotating equipment instrumentation.

Every dataset in the repository is synthetic but structurally faithful to what you would encounter inside a real utility's OMS, GIS, SCADA, AMI, and plant historian systems. The distribution topology, load shapes, failure modes, and DER penetration patterns are modeled on documented industry benchmarks from DOE, EPRI, and IEEE sources. The generation data is physics-based—pump affinity laws, H-Q curves, and efficiency models produce time-series that behave like real plant instrumentation.

"Building ML for the grid has always had two barriers: getting realistic data, and needing to become an ML specialist to use it. SP&L removes the first. AI-assisted development removes the second. What's left is the part you're already good at—understanding the grid."

SP&L at a Glance

A distribution system large enough to be realistic, small enough to fit on a laptop.

48K
Customers Served
Residential, commercial, and industrial load classes with distinct profiles
12
Distribution Feeders
4–13.2 kV feeders spanning urban, suburban, and rural topologies
5 yr
Historical Depth
Hourly load, 15-min AMI, daily weather, and outage records from 2020–2025
14%
DER Penetration
Rooftop solar, community storage, and EV charger installations growing year-over-year
3,200+
Outage Events
Cause-coded, weather-tagged, with crew dispatch and restoration timestamps
OpenDSS
Native Format
Ready to run power flow, with CSV/Parquet exports for ML pipelines
300 MW
Generation Plant
2x1 CCGT with BFP train data: 88 tags at 1-min resolution, 3 embedded fault scenarios

What's Inside the Repository

The Dynamic Network Model is organized into logical layers—from raw network topology to pre-built analysis notebooks. Each layer is documented, versioned, and designed to be used independently or composed together.

01

Network Topology & Electrical Model

The physical distribution system, ready for power flow simulation.

  • OpenDSS master file with 12 feeders, 147 line segments, 86 distribution transformers, and 23 switching devices
  • Bus & node coordinates for GIS visualization and spatial analysis
  • Conductor specifications per feeder section—impedance, ampacity, vintage, and material
  • Protection device settings for reclosers, fuses, and relay coordination zones
  • Capacitor bank and regulator placements with tap settings and control modes
OpenDSS GeoJSON CSV
02

Time-Series Load & Generation Profiles

Five years of synthetic but statistically faithful demand and DER generation data.

  • Hourly substation load for all 12 feeders (2020–2025), decomposed into residential, commercial, and industrial components
  • 15-minute AMI meter data for a representative sample of 2,400 service points with realistic noise, gaps, and meter errors
  • Solar PV generation profiles for 680+ behind-the-meter installations, correlated to local irradiance data
  • Battery storage dispatch records for community-scale and behind-the-meter systems
  • EV charging load shapes at residential Level 2 and commercial DCFC stations, growing by adoption year
Parquet CSV 15-min Resolution
03

Outage & Reliability Records

A complete OMS-style outage history with the detail to train real predictive models.

  • 3,200+ outage events with cause codes (vegetation, equipment failure, animal contact, weather, overload, unknown)
  • Timestamps: fault detected, crew dispatched, crew arrived, service restored—per IEEE 1366 conventions
  • Weather tags: temperature, wind speed, precipitation, and lightning strike proximity at time of event
  • Affected customers & CMI per event, linked to feeder and protective device
  • Major Event Day (MED) flags per IEEE 1366 2.5-beta method
CSV IEEE 1366 Cause-Coded
04

Asset Registry & Condition Data

The infrastructure metadata that powers asset management and predictive maintenance models.

  • Transformer inventory: kVA rating, installation year, manufacturer, oil/dry type, load history summary
  • Conductor & pole records: material, vintage, span lengths, vegetation clearance zones
  • Switching device metadata: recloser model, firmware version, SCADA-controlled vs. manual
  • Condition scores: synthetic health index (1–5) based on age, loading, failure history, and inspection results
  • Maintenance logs: inspection dates, work orders, and replacement records
CSV JSON Asset Health Index
05

Weather & Environmental Data

Localized weather history aligned to the outage and load records.

  • Hourly weather observations: temperature, humidity, wind speed/direction, precipitation, barometric pressure
  • Lightning strike data: synthetic strike records with proximity to feeders
  • Vegetation growth model: seasonal growth rates tied to trim cycle schedules
  • Heat wave and storm event flags for correlation analysis
CSV NOAA-Format Hourly
06

Scenario Configurations & Analysis Notebooks

Pre-built scenarios and starter notebooks so you can run analysis on day one.

  • Baseline scenario: SP&L as-is, 2025 system state
  • High DER scenario: 35% solar penetration + community storage on constrained feeders
  • EV adoption scenario: 20% residential EV penetration with clustered charging
  • Extreme weather scenario: 10-year storm event replay with cascading failures
  • Jupyter notebooks: load forecasting, outage prediction, hosting capacity, and voltage analysis starters
Jupyter Python OpenDSS-py
07

Generation Systems: BFP Train Data

SP&L isn't just a distribution utility—it owns a 300 MW 2x1 combined-cycle generating station. The generation dataset provides plant-side time-series data from the motor-driven Boiler Feed Pump train, the kind of rotating equipment data that predictive maintenance and digital twin applications depend on.

  • 88 process tags at 1-minute, 15-minute, and hourly resolution across 12 months (527,040 rows at 1-min)
  • Two 100% BFP trains: Pump A (duty) and Pump B (standby), each with vibration, bearing temperature, seal leak rate, motor current, and discharge pressure instrumentation
  • 3 embedded fault scenarios: progressive seal degradation (Apr–Jun), bearing wear (Jul–Aug), and coupling misalignment (Oct–Dec)—realistic gradual onset, not step changes
  • Physics-based generation: pump affinity laws, H-Q curves, efficiency bell curves, and motor power calculations ensure the data behaves like real plant instrumentation
  • Event logs: 70 alarms, 1 trip event, and 7 operator actions correlated to the fault timelines
  • Reference data: OEM design parameters, pump curves, heat balance, equipment registry, and a full 88-tag dictionary

The generation data correlates with the distribution dataset—unit MW output follows the same diurnal and seasonal demand patterns as the SP&L grid. This enables cross-domain analysis: how does plant-side equipment health affect grid reliability?

Parquet 1-min Resolution 88 Tags Fault Injection

Repository Structure

sisyphean-power-and-light/ ├── network/ │ ├── master.dss # OpenDSS master file │ ├── lines.dss # Line segment definitions │ ├── transformers.dss # Distribution transformers │ ├── loads.dss # Load allocations by bus │ ├── capacitors.dss # Cap bank placements │ ├── regulators.dss # Voltage regulators │ ├── switches.dss # Reclosers & sectionalizers │ ├── pv_systems.dss # Behind-the-meter solar DER │ ├── storage.dss # Battery storage systems │ └── coordinates.csv # Bus XY for GIS mapping ├── timeseries/ │ ├── substation_load_hourly.parquet # 5-year feeder loads │ ├── ami_15min_sample.parquet # 2,400 meter sample │ ├── pv_generation.parquet # Solar output profiles │ ├── ev_charging.parquet # EV load shapes │ └── battery_dispatch.parquet # Storage charge/discharge ├── outages/ │ ├── outage_events.csv # 3,200+ cause-coded events │ ├── crew_dispatch.csv # Dispatch & restoration logs │ └── reliability_metrics.csv # Annual SAIFI/SAIDI/CAIDI ├── assets/ │ ├── transformers.csv # Inventory & condition scores │ ├── conductors.csv # Line & cable registry │ ├── poles.csv # Pole material, age, class │ ├── switches.csv # Protection device metadata │ └── maintenance_log.csv # Work orders & inspections ├── weather/ │ ├── hourly_observations.csv # Temp, wind, precip, humidity │ ├── lightning_strikes.csv # Proximity-to-feeder records │ └── storm_events.csv # Named events & MED flags ├── scenarios/ │ ├── baseline_2025.json # Current-state configuration │ ├── high_der_2030.json # 35% DER penetration │ ├── ev_adoption_2030.json # 20% residential EV │ └── extreme_weather.json # 10-year storm replay ├── generation/ │ ├── timeseries/ │ │ ├── bfp_train_1min.parquet # 527K rows, 88 tags @ 1-min │ │ ├── bfp_train_15min.parquet # 35K rows, rolled up │ │ └── bfp_train_hourly.parquet # 8,784 rows, rolled up │ ├── events/ │ │ ├── alarm_log.csv # 70 process alarms │ │ ├── trip_log.csv # Equipment trips │ │ └── operator_actions.csv # Manual interventions │ ├── reference/ │ │ ├── design_parameters.json # OEM specs & ratings │ │ ├── pump_curves.csv # H-Q and efficiency curves │ │ └── heat_balance.csv # Plant design heat balance │ ├── equipment_registry.csv # 6 assets (2 pumps, 2 motors, etc.) │ ├── tag_dictionary.csv # All 88 tags with units & ranges │ └── README.md # Generation system documentation ├── notebooks/ │ ├── 01_load_forecasting.ipynb # LSTM & gradient-boost models │ ├── 02_outage_prediction.ipynb # Random forest classifier │ ├── 03_hosting_capacity.ipynb # Iterative power flow HCA │ ├── 04_voltage_analysis.ipynb # Volt-VAR optimization │ ├── 05_asset_health.ipynb # Predictive maintenance │ ├── 06_flisr_simulation.ipynb # Fault isolation modeling │ ├── 17_bfp_health.ipynb # Anomaly detection on BFP │ ├── 18_feedwater_correlation.ipynb# Load-correlation regression │ ├── 19_fault_diagnosis.ipynb # Multi-class + SHAP │ └── 20_digital_twin.ipynb # Physics-informed ML └── README.md

The ML/AI Playground

Two things have kept power engineers from building their own ML tools: data locked behind NDAs and CEII restrictions, and a skills gap between understanding a grid problem and implementing an ML solution. SP&L eliminates the first barrier. AI-assisted development tools eliminate the second. Clone it, load it, train on it—today.

Step-by-Step Beginner & Advanced Guides

Hands-on tutorials for every use case below—8 distribution guides, 8 advanced guides covering deep learning, reinforcement learning, and survival analysis, plus 4 generation systems guides for rotating equipment diagnostics and digital twins. Run the code, understand the approach, then use AI coding assistants to adapt it to your real-world problems.

View All 20 Guides →
Use Case 01

Outage Prediction

Train a classifier on 3,200+ outage events cross-referenced with weather, asset age, vegetation cycles, and time-of-year. The data includes the exact features a utility reliability engineer would use—but without the 18-month procurement cycle to access them.

Start here: Random forest on cause-coded outages. Graduate to temporal convolutional networks for sequence-aware prediction. Benchmark against SP&L's historical SAIFI to validate your model.

XGBoost Random Forest TCN scikit-learn
Beginner Guide → Advanced Guide →
Use Case 02

Load Forecasting

Five years of hourly feeder loads and 15-minute AMI data, decomposed by customer class. Add weather inputs and DER generation to build short-term (day-ahead) and long-term (capacity planning) forecasts that account for behind-the-meter solar cannibalizing net load.

Start here: LSTM or Prophet on substation-level load. Then disaggregate to the feeder level. Then tackle net load forecasting with solar as a confounding variable.

LSTM Prophet Transformer Models PyTorch
Beginner Guide → Advanced Guide →
Use Case 03

Hosting Capacity Analysis

Run iterative power flow studies on the full OpenDSS model to determine how much additional DER each feeder can accommodate before hitting thermal or voltage limits. The network model is pre-configured with existing PV, storage, and EV chargers so you start from a realistic baseline.

Start here: Systematic PV injection at each bus. Map voltage violations and thermal overloads. Compare traditional HCA to ML-accelerated screening methods.

OpenDSS-py Power Flow Voltage Analysis GIS Mapping
Beginner Guide → Advanced Guide →
Use Case 04

Predictive Asset Maintenance

Combine asset condition scores, maintenance logs, loading history, and weather exposure to predict which transformers, reclosers, and conductor segments are most likely to fail in the next 30–90 days. SP&L includes the exact features that utility asset management teams struggle to assemble from their own systems.

Start here: Survival analysis on transformer age + loading. Then layer in gradient-boosted models with weather features. Validate against SP&L's historical failure records.

Survival Analysis XGBoost Feature Engineering Anomaly Detection
Beginner Guide → Advanced Guide →
Use Case 05

FLISR & Restoration Optimization

Replay historical storm events against the network model to simulate how automated Fault Location, Isolation, and Service Restoration would have reduced customer minutes interrupted. Build switching sequence optimization algorithms on a system with realistic topology constraints.

Start here: Counterfactual CMI analysis on the 5 worst storm events. Then build a graph-based optimization model for automated switching. Quantify the avoided CMI to build a business case.

Graph Algorithms NetworkX Optimization Simulation
Beginner Guide → Advanced Guide →
Use Case 06

Volt-VAR Optimization

Develop and test voltage optimization strategies using the network model's capacitor banks, voltage regulators, and smart inverter settings. The time-series load and PV generation data lets you evaluate VVO performance across seasons, time-of-day, and DER penetration levels.

Start here: Baseline voltage profile analysis. Then implement rule-based VVO. Then build a reinforcement learning agent that learns optimal tap and VAR dispatch policies.

Reinforcement Learning OpenDSS-py Control Systems Gym
Beginner Guide → Advanced Guide →
Use Case 07

DER Scenario Planning

Stress-test SP&L's distribution system against aggressive DER adoption futures. The pre-built scenarios include high solar, high EV, and combined penetration configurations. Answer the question every distribution planner asks: "What happens to my system when DER doubles?"

Start here: Run the high-DER scenario and identify reverse power flow feeders. Then build a Monte Carlo simulation for uncertain adoption rates. Map investment triggers to penetration thresholds.

Monte Carlo Scenario Analysis Power Flow Planning
Beginner Guide → Advanced Guide →
Use Case 08

Anomaly Detection & Grid State Estimation

Use the AMI and SCADA-style data to build real-time anomaly detection for voltage excursions, phase imbalances, meter tampering patterns, and non-technical losses. The dataset includes realistic noise, data gaps, and meter errors that make anomaly detection genuinely challenging.

Start here: Autoencoders on AMI voltage time series. Then build an isolation forest for multi-dimensional SCADA anomaly detection. Validate against injected fault scenarios.

Autoencoders Isolation Forest State Estimation PyTorch
Beginner Guide → Advanced Guide →

Generation Systems

The same ML techniques that work on the distribution grid apply to the power plant. SP&L's 300 MW combined-cycle station provides a second domain for the same analytical approaches: anomaly detection, regression, fault classification, and physics-informed modeling—applied to rotating equipment instead of feeders and transformers.

Use Case 09

Rotating Equipment Health Monitoring

88 process tags from a motor-driven boiler feed pump train—vibration, bearing temperatures, seal leak rates, motor current, discharge pressure—sampled every minute for a full year. Three progressive fault scenarios are embedded in the data: seal degradation, bearing wear, and coupling misalignment. Each fault develops gradually over weeks, compounding with load, exactly like real plant failures.

Start here: Rolling statistics and threshold alerts on bearing temps. Then train an Isolation Forest for multivariate anomaly detection. Graduate to multi-class fault classification with SHAP explainability.

Isolation Forest Random Forest SHAP scikit-learn
Beginner Guide → Advanced Guide →
Use Case 10

Plant Performance & Digital Twins

The generation dataset includes OEM pump curves, design parameters, and a heat balance for the BFP system. This is everything needed to build a physics-informed digital twin: compare actual pump performance to the expected curves, detect efficiency decay before it triggers an alarm, and create a composite health index that tracks degradation across all instrumented parameters simultaneously.

Start here: Regress MW output to feedwater flow and motor power to establish baseline correlations. Then build a pump curve model from OEM data and track actual-vs-expected performance. Graduate to a composite health index combining physics residuals with statistical anomaly scores.

Physics-Informed ML Pump Curves Residual Analysis scipy
Beginner Guide → Advanced Guide →

Built for Engineers Who Build Things

The gap between "I understand this problem" and "I can build an ML model for it" has been too wide for too long—blocked by data access on one side and specialized tooling on the other. SP&L provides the data. AI-assisted development provides a co-pilot for the code. Your domain expertise does the rest.

Power Systems Engineers

You know which problems matter—which feeders are trouble, which assets are aging out, which load patterns signal something wrong. Now you can build the solutions yourself. AI-assisted development tools translate your engineering intuition into working code, and SP&L gives you a realistic system to build against without touching production data.

Plant & Reliability Engineers

You live in vibration spectra, bearing clearances, and pump curves. The generation dataset gives you a realistic BFP train with embedded faults you can detect, classify, and track with the same ML techniques the distribution side uses—anomaly detection, regression, physics-informed models. Build a digital twin on SP&L before deploying against your plant historian.

Data Scientists Entering Energy

You have the ML skills but not the domain context. SP&L comes with documented data dictionaries, industry-standard metrics (SAIFI, SAIDI, CAIDI), and notebooks that bridge the gap between general data science and power systems engineering.

Utility Innovation Teams

You need to prototype analytics use cases before pitching them internally. SP&L plus AI-assisted development lets you build a working demo in days, not months—on realistic data, validated against real engineering constraints, ready to show leadership without waiting for IT to provision access to production systems.

Academic Researchers

You need a common, open reference system that reviewers and collaborators can reproduce. SP&L provides a citable, versioned, realistic distribution network model with enough complexity to publish meaningful results.

Independent Consultants

You need to demonstrate feasibility and build proof-of-concept analyses for clients, but accessing their data takes months. SP&L gives you a realistic utility-scale reference system to develop and validate your methodologies before engagement, shortening sales cycles and establishing credibility.

AI/ML Engineers

You want a domain-specific playground that isn't MNIST or Kaggle tabular data. Power systems offer time-series, graph topology, spatial data, multi-objective optimization, and real-time control problems—all in one domain. SP&L packages them in a single repo.

The Thesis: Just Build Things Now

The power grid is the largest machine ever built, and it's being reinvented in real time. Solar, storage, EVs, and electrification are fundamentally changing how distribution systems operate. The utilities managing this transition need analytical tools built by people who understand the grid—and they need them faster than traditional vendor procurement or data science hiring pipelines can deliver.

For years, experienced power engineers have been sidelined from building these tools—not because they lack the ability to define what needs to be built, but because two barriers stood in the way. First, realistic grid data was locked behind NDAs, CEII restrictions, and vendor contracts. Second, translating engineering knowledge into working code required specialized programming skills that took years to develop. Both barriers are now gone. SP&L provides the open data. Modern AI-assisted development tools handle the coding scaffolding. The equation has changed.

An experienced distribution engineer with the SP&L dataset can now:

  • Build an outage prediction model trained on 3,200+ cause-coded events, focusing on feature engineering and domain judgment rather than wrestling with syntax
  • Develop a load forecasting pipeline on five years of realistic hourly data, iterating rapidly on model architectures without getting blocked by implementation details
  • Prototype a hosting capacity screening tool and test it against multiple DER scenarios in a single afternoon
  • Design a voltage optimization strategy on a system with real-world topology constraints, defining the control logic and reward functions while modern tools generate the implementation
  • Publish research on a common, reproducible reference system that others can build on

The best grid analytics won't come from ML specialists guessing at domain context. They'll be built by power engineers who understand the physics, the operations, and the failure modes. The domain expertise is the irreplaceable ingredient. The data is now open. The development tools are now accessible. What's left is what you're already good at—understanding the grid.

"The grid doesn't need more feasibility studies. It needs engineers who understand the problems shipping working solutions against realistic data. The tools to do that exist now. SP&L is the starting line."

Start Building

Clone Sisyphean Power & Light, fire up an AI coding assistant, and start building. No procurement. No NDAs. No specialized team required. Your domain expertise is the irreplaceable ingredient.

Get Early Access
"The boulder will roll back down. But each push teaches you something new, your models get sharper, and your AI tools get better at helping you push. One must imagine the engineer happy."