A realistic, synthetic utility—distribution grid and generation plant—you can clone, query, model, and break. From feeders and transformers to boiler feed pumps and vibration data. Experienced power engineers have always understood the problems worth solving—now, with realistic data and AI-assisted development tools, you can build the ML applications that used to require a dedicated data science team.
Sisyphean Power & Light (SP&L) is a fictional municipal utility serving approximately 48,000 customers across a mixed urban, suburban, and rural service territory, powered in part by a 300 MW 2x1 combined-cycle generating station. The Dynamic Network Model is a fully-realized, open-source representation of SP&L's distribution system and generation assets—complete with time-series load data, DER installations, historical outage records, asset metadata, weather correlations, protection device configurations, and plant-side rotating equipment instrumentation.
Every dataset in the repository is synthetic but structurally faithful to what you would encounter inside a real utility's OMS, GIS, SCADA, AMI, and plant historian systems. The distribution topology, load shapes, failure modes, and DER penetration patterns are modeled on documented industry benchmarks from DOE, EPRI, and IEEE sources. The generation data is physics-based—pump affinity laws, H-Q curves, and efficiency models produce time-series that behave like real plant instrumentation.
"Building ML for the grid has always had two barriers: getting realistic data, and needing to become an ML specialist to use it. SP&L removes the first. AI-assisted development removes the second. What's left is the part you're already good at—understanding the grid."
A distribution system large enough to be realistic, small enough to fit on a laptop.
The Dynamic Network Model is organized into logical layers—from raw network topology to pre-built analysis notebooks. Each layer is documented, versioned, and designed to be used independently or composed together.
The physical distribution system, ready for power flow simulation.
Five years of synthetic but statistically faithful demand and DER generation data.
A complete OMS-style outage history with the detail to train real predictive models.
The infrastructure metadata that powers asset management and predictive maintenance models.
Localized weather history aligned to the outage and load records.
Pre-built scenarios and starter notebooks so you can run analysis on day one.
SP&L isn't just a distribution utility—it owns a 300 MW 2x1 combined-cycle generating station. The generation dataset provides plant-side time-series data from the motor-driven Boiler Feed Pump train, the kind of rotating equipment data that predictive maintenance and digital twin applications depend on.
The generation data correlates with the distribution dataset—unit MW output follows the same diurnal and seasonal demand patterns as the SP&L grid. This enables cross-domain analysis: how does plant-side equipment health affect grid reliability?
Two things have kept power engineers from building their own ML tools: data locked behind NDAs and CEII restrictions, and a skills gap between understanding a grid problem and implementing an ML solution. SP&L eliminates the first barrier. AI-assisted development tools eliminate the second. Clone it, load it, train on it—today.
Hands-on tutorials for every use case below—8 distribution guides, 8 advanced guides covering deep learning, reinforcement learning, and survival analysis, plus 4 generation systems guides for rotating equipment diagnostics and digital twins. Run the code, understand the approach, then use AI coding assistants to adapt it to your real-world problems.
Train a classifier on 3,200+ outage events cross-referenced with weather, asset age, vegetation cycles, and time-of-year. The data includes the exact features a utility reliability engineer would use—but without the 18-month procurement cycle to access them.
Start here: Random forest on cause-coded outages. Graduate to temporal convolutional networks for sequence-aware prediction. Benchmark against SP&L's historical SAIFI to validate your model.
Five years of hourly feeder loads and 15-minute AMI data, decomposed by customer class. Add weather inputs and DER generation to build short-term (day-ahead) and long-term (capacity planning) forecasts that account for behind-the-meter solar cannibalizing net load.
Start here: LSTM or Prophet on substation-level load. Then disaggregate to the feeder level. Then tackle net load forecasting with solar as a confounding variable.
Run iterative power flow studies on the full OpenDSS model to determine how much additional DER each feeder can accommodate before hitting thermal or voltage limits. The network model is pre-configured with existing PV, storage, and EV chargers so you start from a realistic baseline.
Start here: Systematic PV injection at each bus. Map voltage violations and thermal overloads. Compare traditional HCA to ML-accelerated screening methods.
Combine asset condition scores, maintenance logs, loading history, and weather exposure to predict which transformers, reclosers, and conductor segments are most likely to fail in the next 30–90 days. SP&L includes the exact features that utility asset management teams struggle to assemble from their own systems.
Start here: Survival analysis on transformer age + loading. Then layer in gradient-boosted models with weather features. Validate against SP&L's historical failure records.
Replay historical storm events against the network model to simulate how automated Fault Location, Isolation, and Service Restoration would have reduced customer minutes interrupted. Build switching sequence optimization algorithms on a system with realistic topology constraints.
Start here: Counterfactual CMI analysis on the 5 worst storm events. Then build a graph-based optimization model for automated switching. Quantify the avoided CMI to build a business case.
Develop and test voltage optimization strategies using the network model's capacitor banks, voltage regulators, and smart inverter settings. The time-series load and PV generation data lets you evaluate VVO performance across seasons, time-of-day, and DER penetration levels.
Start here: Baseline voltage profile analysis. Then implement rule-based VVO. Then build a reinforcement learning agent that learns optimal tap and VAR dispatch policies.
Stress-test SP&L's distribution system against aggressive DER adoption futures. The pre-built scenarios include high solar, high EV, and combined penetration configurations. Answer the question every distribution planner asks: "What happens to my system when DER doubles?"
Start here: Run the high-DER scenario and identify reverse power flow feeders. Then build a Monte Carlo simulation for uncertain adoption rates. Map investment triggers to penetration thresholds.
Use the AMI and SCADA-style data to build real-time anomaly detection for voltage excursions, phase imbalances, meter tampering patterns, and non-technical losses. The dataset includes realistic noise, data gaps, and meter errors that make anomaly detection genuinely challenging.
Start here: Autoencoders on AMI voltage time series. Then build an isolation forest for multi-dimensional SCADA anomaly detection. Validate against injected fault scenarios.
The same ML techniques that work on the distribution grid apply to the power plant. SP&L's 300 MW combined-cycle station provides a second domain for the same analytical approaches: anomaly detection, regression, fault classification, and physics-informed modeling—applied to rotating equipment instead of feeders and transformers.
88 process tags from a motor-driven boiler feed pump train—vibration, bearing temperatures, seal leak rates, motor current, discharge pressure—sampled every minute for a full year. Three progressive fault scenarios are embedded in the data: seal degradation, bearing wear, and coupling misalignment. Each fault develops gradually over weeks, compounding with load, exactly like real plant failures.
Start here: Rolling statistics and threshold alerts on bearing temps. Then train an Isolation Forest for multivariate anomaly detection. Graduate to multi-class fault classification with SHAP explainability.
The generation dataset includes OEM pump curves, design parameters, and a heat balance for the BFP system. This is everything needed to build a physics-informed digital twin: compare actual pump performance to the expected curves, detect efficiency decay before it triggers an alarm, and create a composite health index that tracks degradation across all instrumented parameters simultaneously.
Start here: Regress MW output to feedwater flow and motor power to establish baseline correlations. Then build a pump curve model from OEM data and track actual-vs-expected performance. Graduate to a composite health index combining physics residuals with statistical anomaly scores.
The gap between "I understand this problem" and "I can build an ML model for it" has been too wide for too long—blocked by data access on one side and specialized tooling on the other. SP&L provides the data. AI-assisted development provides a co-pilot for the code. Your domain expertise does the rest.
You know which problems matter—which feeders are trouble, which assets are aging out, which load patterns signal something wrong. Now you can build the solutions yourself. AI-assisted development tools translate your engineering intuition into working code, and SP&L gives you a realistic system to build against without touching production data.
You live in vibration spectra, bearing clearances, and pump curves. The generation dataset gives you a realistic BFP train with embedded faults you can detect, classify, and track with the same ML techniques the distribution side uses—anomaly detection, regression, physics-informed models. Build a digital twin on SP&L before deploying against your plant historian.
You have the ML skills but not the domain context. SP&L comes with documented data dictionaries, industry-standard metrics (SAIFI, SAIDI, CAIDI), and notebooks that bridge the gap between general data science and power systems engineering.
You need to prototype analytics use cases before pitching them internally. SP&L plus AI-assisted development lets you build a working demo in days, not months—on realistic data, validated against real engineering constraints, ready to show leadership without waiting for IT to provision access to production systems.
You need a common, open reference system that reviewers and collaborators can reproduce. SP&L provides a citable, versioned, realistic distribution network model with enough complexity to publish meaningful results.
You need to demonstrate feasibility and build proof-of-concept analyses for clients, but accessing their data takes months. SP&L gives you a realistic utility-scale reference system to develop and validate your methodologies before engagement, shortening sales cycles and establishing credibility.
You want a domain-specific playground that isn't MNIST or Kaggle tabular data. Power systems offer time-series, graph topology, spatial data, multi-objective optimization, and real-time control problems—all in one domain. SP&L packages them in a single repo.
The power grid is the largest machine ever built, and it's being reinvented in real time. Solar, storage, EVs, and electrification are fundamentally changing how distribution systems operate. The utilities managing this transition need analytical tools built by people who understand the grid—and they need them faster than traditional vendor procurement or data science hiring pipelines can deliver.
For years, experienced power engineers have been sidelined from building these tools—not because they lack the ability to define what needs to be built, but because two barriers stood in the way. First, realistic grid data was locked behind NDAs, CEII restrictions, and vendor contracts. Second, translating engineering knowledge into working code required specialized programming skills that took years to develop. Both barriers are now gone. SP&L provides the open data. Modern AI-assisted development tools handle the coding scaffolding. The equation has changed.
An experienced distribution engineer with the SP&L dataset can now:
The best grid analytics won't come from ML specialists guessing at domain context. They'll be built by power engineers who understand the physics, the operations, and the failure modes. The domain expertise is the irreplaceable ingredient. The data is now open. The development tools are now accessible. What's left is what you're already good at—understanding the grid.
"The grid doesn't need more feasibility studies. It needs engineers who understand the problems shipping working solutions against realistic data. The tools to do that exist now. SP&L is the starting line."
Clone Sisyphean Power & Light, fire up an AI coding assistant, and start building. No procurement. No NDAs. No specialized team required. Your domain expertise is the irreplaceable ingredient.
Get Early Access"The boulder will roll back down. But each push teaches you something new, your models get sharper, and your AI tools get better at helping you push. One must imagine the engineer happy."