Google runs search at 99.95% availability. Your utility runs a feeder at 99.97%. The tools to reason about that budget exist, and they weren't invented at utilities. We're porting them over.
The vocabulary differs; the underlying math is the same probability theory. What changes is that utilities historically treated the budget implicitly, averaged annually, and reviewed after the fact.
Every utility publishes SAIDI and SAIFI targets. Most measure against them once a year, in a PUC filing, long after any action could have changed the outcome. That's not how modern reliability engineering works.
A software SRE team running a 99.95% service doesn't wait until December to check the budget. They measure burn rate continuously, alert when a week consumes a month of budget, and treat every error as telemetry rather than blame. The grid version is identical in structure: same math, same cadence, same discipline.
The work isn't inventing new math. It's porting five specific operational practices (budget accounting, burn-rate alerting, blameless postmortems, toil reduction, and change-freeze discipline) into utility reliability workflows. And convincing leadership that their line crews are, and always have been, SREs.
Six open-source modules. Each one ports a specific SRE practice into utility vocabulary, infrastructure, and data models. Use any subset.
Continuously rolls your SAIDI/SAIFI targets into remaining-minutes budgets, sliced by feeder, cause-code, and time window. Pulls from OMS, writes to Postgres.
Multi-window multi-burn-rate alerts in the Google SRE tradition, ported to SCADA/OMS event streams. Catches fast-burn failures in hours, slow-burn drift in weeks.
Structured postmortem template + review workflow. Built to satisfy NERC/FERC documentation requirements while producing actionable learning, not CYA paperwork.
Quantifies repetitive, automatable work your line crews do. Identifies the top ten toil sources per district; recommends automation candidates ranked by hours-saved.
Enforces change-window rules when budget is exhausted. Integrates with your work-management system to block non-emergency switching when burn rate exceeds threshold.
Scheduled N-1 and N-1-1 drill runner. Executes controlled contingencies on a digital twin, measures actual response time, feeds results back into the budget.
Observe → decide → act → review. The same cycle any SRE team runs, adapted to the time constants and stakes of utility operations.
Actual SAIDI minutes consumed, plotted against the month's allowable budget, with short- and long-window burn rates. This is Q3 2025 on Feeder 104-B.
The five questions that come up in every exploratory call with a reliability or operations executive.
Your crews already run something very close to SRE practice; they just don't have the accounting to prove it to rate-setters, investors, or themselves. CEB gives the operational reality a name, a dashboard, and a defensible record. The crew practice doesn't change; the ability to show that practice working does.
The output is a more granular, more documented version of what you already file. We've structured the postmortem module to produce the exact artifacts most PUCs request in rate cases, and the budget ledger maps one-to-one onto annual SAIDI/SAIFI reporting. Regulators like it more, not less.
The modules read from OMS event streams, SCADA historians, and WMS change records. They write back nowhere operational; everything is advisory and auditable. The adapters are written against standard protocols and interfaces that the common utility stacks expose; the integration layer is in the repo.
Fair point. Bargaining-unit contracts are the real constraint, not the methodology. The template separates individual actions from systemic factors explicitly, and is structured so reports leaving the bargaining unit don't surface individual names. The practical work is a bargaining letter, not a methodology change. We can help draft the language.
The burn-rate approach is more economically rigorous than calendar-averaged reporting, not less. It produces continuous marginal cost-of-reliability estimates, the exact quantity a prudency review is trying to reconstruct after the fact. The output is the shape of analysis regulators increasingly expect to see in reliability investment filings.
The 12,000-word methodology paper is the theoretical foundation. The one-week reliability diagnostic is the practical starting point. Most utilities do both, in that order.