Per-circuit reliability tracking that tells you which feeders are in trouble, why, and what to do about it—before the annual report lands.
Monthly KPI reports and annual regulatory filings give you system-level averages. A system with 104 feeders can show SAIDI of 150 minutes while 5 feeders are in catastrophic failure—the average hides the crisis. Calendar-based maintenance treats every feeder the same regardless of condition. By the time the annual report reveals a problem, you have lost a year of intervention time.
The real questions are per-feeder: Which circuits are burning through their reliability allowance? Is the burn rate accelerating or stable? Are specific customers bearing disproportionate outage burden? What failure mode is driving the problem? The system average cannot answer any of these.
| Dimension | Traditional (Monthly/Annual) | Error Budget (Continuous) |
|---|---|---|
| Granularity | System-wide averages (SAIDI, SAIFI) | Per-feeder budget with individual SLO tiers |
| Timing | Backward-looking: monthly reports, annual filings | Real-time burn rate tracking with forward projection |
| Maintenance | Calendar-based schedules regardless of condition | Condition-based actions triggered by budget thresholds |
| Response | Reactive: investigate after KPI breach | Proactive: intervene before budget exhaustion |
| Equity | System average masks worst-served customers | CEMI overlay identifies concentrated outage burden |
| Trending | Year-over-year comparison of annual totals | Weibull analysis detects accelerating failure regimes |
The Circuit Error Budget Framework applies Site Reliability Engineering (SRE) error budget methodology—the same approach Google, Amazon, and Netflix use to manage service reliability at scale—to electric distribution circuits. Every feeder gets an annual error budget: a maximum allowable amount of customer-minutes interrupted. When a feeder consumes its budget through outages, the framework triggers increasingly aggressive corrective actions. When a feeder is healthy, you leave it alone and invest elsewhere.
Why per-feeder beats system average: A system with 65 feeders at SAIDI=150 min could have 60 healthy feeders and 5 in catastrophic failure. The system average tells you everything is fine. The error budget tells you exactly where to act. A feeder is your microservice. SAIDI is your latency SLI. The error budget is your reliability contract.
The translation table between software reliability and grid reliability concepts.
| SRE Concept | Utility Equivalent | Framework Implementation |
|---|---|---|
| SLI (Service Level Indicator) | SAIDI minutes per feeder-year | ytd_saidi in SAIDI model output |
| SLO (Service Level Objective) | Target: "this feeder shall not exceed X minutes" | saidi_budget_min from SLO tier config |
| Error Budget | Allowable outage minutes remaining | remaining_budget_min = SLO - YTD consumed |
| Burn Rate | Velocity of budget consumption (1.0x = on pace) | saidi_burn_rate, saifi_burn_rate |
| Error Budget Policy | "When budget drops below X%, take action Y" | PolicyEvaluator generates actions |
| Composite Score | Single 0-100 health metric blending SAIDI + SAIFI | composite_score from Composite model |
Master table (one row per feeder, all metrics merged) → Policy evaluation → Priority scoring → Tiered corrective actions (13 action types, 3 urgency levels)
Failure regime detection: random vs. aging vs. infant mortality
0-100 health score blending SAIDI + SAIFI weights
Duration tracking with burn rate
Frequency tracking, blip detection
Worst-served customer concentration
IEEE 1782 cause-code partitioning
238,000 customers | 104 feeders | 23 substations | Outage history with cause codes, timestamps, affected customers
Each feeder is assigned a tier based on its customer mix and criticality. The tier sets the annual error budget.
Select a real SP&L feeder to see its error budget status, composite health score, and recommended policy actions.
Models in the pipeline
Tests passing
SLO tiers (Critical to Rural)
The framework consumes standard utility data exports. If you have outage management and GIS data, you can run a full budget analysis on your circuits. Here is what you need:
Cause codes: equipment_failure, vegetation, weather, animal_contact, overload. Map your utility's OMS codes to these categories. The cause model handles additional categories but the policy evaluator's action mapping is tuned for these five.
The framework builds on concepts explored in the SRE article series and connects to the interactive calculators for hands-on exploration.
Bring your outage data. We will show you which circuits need attention before the annual report tells you it is too late.
Book a Discovery Call