Portfolio

Circuit Error Budget Framework

Per-circuit reliability tracking that tells you which feeders are in trouble, why, and what to do about it—before the annual report lands.

The Problem

Monthly KPI reports and annual regulatory filings give you system-level averages. A system with 104 feeders can show SAIDI of 150 minutes while 5 feeders are in catastrophic failure—the average hides the crisis. Calendar-based maintenance treats every feeder the same regardless of condition. By the time the annual report reveals a problem, you have lost a year of intervention time.

The real questions are per-feeder: Which circuits are burning through their reliability allowance? Is the burn rate accelerating or stable? Are specific customers bearing disproportionate outage burden? What failure mode is driving the problem? The system average cannot answer any of these.

Traditional Reporting vs. Error Budgets

Dimension Traditional (Monthly/Annual) Error Budget (Continuous)
Granularity System-wide averages (SAIDI, SAIFI) Per-feeder budget with individual SLO tiers
Timing Backward-looking: monthly reports, annual filings Real-time burn rate tracking with forward projection
Maintenance Calendar-based schedules regardless of condition Condition-based actions triggered by budget thresholds
Response Reactive: investigate after KPI breach Proactive: intervene before budget exhaustion
Equity System average masks worst-served customers CEMI overlay identifies concentrated outage burden
Trending Year-over-year comparison of annual totals Weibull analysis detects accelerating failure regimes

The Approach

The Circuit Error Budget Framework applies Site Reliability Engineering (SRE) error budget methodology—the same approach Google, Amazon, and Netflix use to manage service reliability at scale—to electric distribution circuits. Every feeder gets an annual error budget: a maximum allowable amount of customer-minutes interrupted. When a feeder consumes its budget through outages, the framework triggers increasingly aggressive corrective actions. When a feeder is healthy, you leave it alone and invest elsewhere.

Why per-feeder beats system average: A system with 65 feeders at SAIDI=150 min could have 60 healthy feeders and 5 in catastrophic failure. The system average tells you everything is fine. The error budget tells you exactly where to act. A feeder is your microservice. SAIDI is your latency SLI. The error budget is your reliability contract.

SRE-to-Utility Rosetta Stone

The translation table between software reliability and grid reliability concepts.

SRE-to-Utility Quick Reference Card

SRE Concept Utility Equivalent Framework Implementation
SLI (Service Level Indicator) SAIDI minutes per feeder-year ytd_saidi in SAIDI model output
SLO (Service Level Objective) Target: "this feeder shall not exceed X minutes" saidi_budget_min from SLO tier config
Error Budget Allowable outage minutes remaining remaining_budget_min = SLO - YTD consumed
Burn Rate Velocity of budget consumption (1.0x = on pace) saidi_burn_rate, saifi_burn_rate
Error Budget Policy "When budget drops below X%, take action Y" PolicyEvaluator generates actions
Composite Score Single 0-100 health metric blending SAIDI + SAIFI composite_score from Composite model

Framework Architecture

Phase 3: Orchestrator + Policy Evaluator

Master table (one row per feeder, all metrics merged) → Policy evaluation → Priority scoring → Tiered corrective actions (13 action types, 3 urgency levels)

Phase 2: Derived Models
Weibull Trending

Failure regime detection: random vs. aging vs. infant mortality

Composite Score

0-100 health score blending SAIDI + SAIFI weights

Phase 1: Independent Models
SAIDI Budget

Duration tracking with burn rate

SAIFI Budget

Frequency tracking, blip detection

CEMI Equity

Worst-served customer concentration

Cause Budget

IEEE 1782 cause-code partitioning

Data Layer: SP&L Dynamic Network Model

238,000 customers | 104 feeders | 23 substations | Outage history with cause codes, timestamps, affected customers

SLO Tiers

Each feeder is assigned a tier based on its customer mix and criticality. The tier sets the annual error budget.

99.99%
Critical
52.6 min/year budget
99.97%
Commercial
157.7 min/year budget
99.95%
Standard
262.8 min/year budget
99.90%
Rural
525.6 min/year budget

Live Demo: Feeder Budget Analysis

Select a real SP&L feeder to see its error budget status, composite health score, and recommended policy actions.

Budget Status

--
Budget Remaining
--
Burn Rate
--
Health Score
--
Status

Policy Actions

By the Numbers

6

Models in the pipeline

39

Tests passing

4

SLO tiers (Critical to Rural)

Run It On Your Data

The framework consumes standard utility data exports. If you have outage management and GIS data, you can run a full budget analysis on your circuits. Here is what you need:

outages.csv feeder_id, fault_detected, service_restored, affected_customers, cause_code
feeders.csv feeder_id, name, customer_count, length_miles, substation_id, peak_load_mw

Cause codes: equipment_failure, vegetation, weather, animal_contact, overload. Map your utility's OMS codes to these categories. The cause model handles additional categories but the policy evaluator's action mapping is tuned for these five.

Book a Discovery Call

Explore Further

The framework builds on concepts explored in the SRE article series and connects to the interactive calculators for hands-on exploration.

Error Budget Calculator Outage Cost Calculator SRE Article Series

Your Feeders Have Error Budgets. You Just Can't See Them Yet.

Bring your outage data. We will show you which circuits need attention before the annual report tells you it is too late.

Book a Discovery Call