SRE doesn't replace IEEE 1366. It makes it better.

IEEE 1366 reliability indices — SAIDI, SAIFI, CAIDI — are regulatory requirements that aren't going away. Defined in IEEE Std 1366-2022 and required by 35+ state public utility commissions, they serve as annual report cards for regulators, benchmarking tools for the industry, and accountability mechanisms for utilities. Every reliability engineer in the country knows these numbers.

So when someone starts talking about applying Site Reliability Engineering to grid operations, the first question from any experienced utility professional is fair: "How does this fit with what we already report?"

The answer is straightforward. SRE doesn't replace IEEE 1366. Practices adapted from SRE principles provide the continuous operational feedback loop that IEEE 1366 was never designed to deliver. SAIDI and SAIFI are the report card. SRE-inspired operations give you visibility into the grade before it's finalized. One drives daily decisions. The other summarizes annual outcomes.

The problem most utilities face is that they have the report card without the engine. They know last year's SAIDI was 120 minutes (close to the national average of 118.4 minutes excluding major events in 2023, per EIA Electric Power Annual). They don't know, with any confidence, what this year's SAIDI will be until December.

§ 01The temporal hierarchy

SRE and IEEE 1366 aren't competing frameworks. They operate at entirely different timescales, and understanding the temporal hierarchy is what makes them complementary.

Layer	Timeframe	Purpose	Metrics
SRE operations	Real-time to 30 days	Operational guidance, early warning	SLIs, SLOs, burn rates
SRE tactical	30 to 90 days	Improvement prioritization	Error budget trends, toil analysis
SRE strategic	90 days to 1 year	Investment planning	Reliability forecasting
IEEE 1366	Annual	Regulatory reporting, benchmarking	SAIDI, SAIFI, CAIDI

Fig. 01 · Four timescales stacked. SRE fills the gap between "right now" and "what the commission sees in December."

The relationship flows in one direction. SRE practices run continuously and produce improved operations. Improved operations produce better IEEE 1366 metrics at the annual filing. Real-time Service Level Indicators predict SAIDI and SAIFI trends months in advance, giving operators the ability to intervene proactively rather than explain poor results after the fact.

Part 03 made the case that traditional N-1/N-2 contingency planning gives you a snapshot of capability at a single point. SRE's continuous measurement fills the gap between snapshots. IEEE 1366 provides the annual summary at the end. Together they form a complete picture: real-time awareness, continuous improvement, standardized reporting.

§ 02How SRE practices drive the indices down

The connection between SRE practices and IEEE 1366 outcomes isn't abstract. Each SRE mechanism maps to specific improvements in specific indices.

The formulas matter. SAIDI measures total outage duration normalized by customers served; SAIFI counts how often customers experience interruptions; CAIDI is the ratio.

SAIDI = ∑(customer-minutes interrupted) / total customers served EQ. 01

SAIFI = ∑(customers interrupted per event) / total customers served EQ. 02

CAIDI = SAIDI / SAIFI = CMI / CI EQ. 03

SRE practice	Mechanism	Primary index impact
FLISR response-time SLO	Faster fault location and automated restoration	SAIDI (duration), CAIDI (restoration time)
Automated switching	Reduced manual dispatch time	SAIDI (minutes saved per event)
Predictive maintenance	Prevent failures before occurrence	SAIFI (fewer events), SAIDI (fewer outages)
Burn-rate alerting	Early warning of degrading reliability	SAIDI / SAIFI (intervene before trend worsens)
Fault detection SLI	Faster identification of incipient failures	SAIFI (prevent interruptions entirely)
Condition monitoring	Predictive asset replacement	SAIFI (reduce equipment-driven failures)
Toil reduction	More time for preventive work	SAIFI (shift reactive to proactive maintenance)

Fig. 02 · Seven SRE practices, and which index they show up in. This is the translation layer between operations and the annual filing.

These aren't theoretical improvements. Utilities that have deployed distribution automation with measurable SLI tracking have documented real results.

EPB of Chattanooga (DOE Smart Grid Investment Grant): After deploying automated FLISR across its territory (2011-2014), EPB documented approximately 55% SAIDI reduction and 45% SAIFI reduction, with a benefit-to-cost ratio DOE called one of the highest in the SGIG program.
ComEd (Exelon): Documented 60% CAIDI improvement on automated circuits through its FLISR deployment across northern Illinois, with restoration times dropping from hours to minutes on covered feeders.
SMUD (Sacramento): Reported 28% SAIDI improvement through distribution automation and smart switching, concentrated on feeders where automation coverage was highest.

None of these require abandoning existing processes. They augment what reliability teams already do by adding measurement, targets, and feedback loops. In most cases they integrate with existing ADMS and OMS platforms rather than replacing them.

§ 03A caution on CAIDI in grid modernization

IEEE Std 1366-2022 includes Annex D, an informative appendix titled "Understanding CAIDI" dedicated entirely to cautioning against oversimplifying this index. The annex opens by noting that "CAIDI is a frequently misunderstood electric distribution reliability index" and warns against treating it at face value as a measure of average interruption duration.

The reason matters for any utility deploying automation. CAIDI equals CMI (customer minutes of interruption) divided by CI (customer interruptions). When FLISR automates restoration for a large number of short-duration faults, it reduces CI dramatically because many customers who would have experienced a sustained interruption now experience a momentary one (or none at all). But the remaining interruptions that require manual crew response tend to be longer and harder to fix. If CMI doesn't drop proportionally to CI, CAIDI goes up even though the utility is objectively performing better.

The CAIDI paradox, numerically Before FLISR: 1,000 interruptions avg 90 minutes each → CAIDI = 90 min.
After FLISR: 400 interruptions avg 120 minutes each (the hard ones are all that's left) → CAIDI = 120 min.
SAIDI dropped from 90 min to 48 min on the same customer base. CAIDI went up. The utility is performing better. The index reads worse.

SAIDI dropped 47% in that example. CAIDI rose 33%. A regulator reading only the CAIDI number would conclude the utility got worse. A regulator reading SAIDI alongside it would see the real story. Utilities deploying automation need to educate both their own leadership and their commission staff on how these indices move together. Otherwise, the reward for shipping automation is a CAIDI headline that doesn't match reality.

§ 04SLIs that predict the indices

The highest-leverage SRE practice for IEEE 1366 improvement is continuous SLI measurement. Service Level Indicators are the raw signals operations teams track in real time. When chosen well, they correlate tightly with SAIDI and SAIFI months before the annual filing closes the books. That lead time is the whole point.

Fault detection time SLI. Time between fault occurrence and detection by SCADA or sensor correlation. Target: under 60 seconds for 99% of faults.
Fault location SLI. Time between detection and the dispatch system knowing which segment is faulted. Target: under 60 seconds for automated, under 15 minutes for manual patrol.
Automated restoration SLI. Percentage of faults resolved by FLISR without crew dispatch. Target: 40% or higher on automated circuits.
Protection relay test coverage SLI. Percentage of relays tested within the last 12 months. Target: 100% for tier-1, 95% overall.
DER event response SLI. Percentage of DER frequency and voltage events handled without operator intervention. Target: 95% or higher.

Each of these SLIs has a direct line to SAIDI or SAIFI. Fault detection time improvements cut minutes off every outage. Automated restoration rate reduces both SAIDI and SAIFI. Protection relay test coverage prevents catastrophic failures that would otherwise drive both indices up. Continuous measurement gives you the chance to correct course before December's filing.

§ 05Error budget policy, grid edition

SRE's most useful organizational concept is the error budget policy. You derive the annual error budget from your regulatory target, then spend it with intent. If you overspend it by mid-year, that triggers specific actions: freeze non-essential changes, redirect crews toward preventive work, delay new automation rollouts until reliability trends re-stabilize. If you underspend it, you have slack to push higher-risk improvements that would otherwise be too cautious to attempt.

The policy should be written and reviewed with regulatory affairs before it governs operational decisions. PUC filings that cite an error budget policy demonstrate a discipline beyond "we tried our best"; they show a system for making tradeoffs prudently and transparently. That matters in prudency reviews.

§ 06What regulators actually want

State commissions want three things from their utilities: reliable service, predictable rates, and defensible explanations when things go wrong. IEEE 1366 indices are the reporting artifact, not the goal. A utility that adopts SRE practices delivers all three faster than one that reports the same indices annually and hopes for the best.

Reliable service: documented 20-55% SAIDI reductions from automation tracked against explicit SLOs. Predictable rates: the economics (Part 06 covers this) show SRE as low-capital, fast-payback, system-wide benefit. Defensible explanations: blameless postmortems and continuous SLI measurement give you the causal chain in days instead of months, and the paper trail holds up in a rate case.

§ 07Next in the series

Part 06 puts a dollar sign on everything this article described. A 30-minute SAIDI improvement avoids about $10 million per year on a 1,000 MW utility using a conservative $20,000/MWh value-of-lost-load. A typical SRE program costs about $2 million per year. The math is not complicated. The hard part is believing it, then doing the work.

— Adam · adam@sgridworks.com · Apr 1, 2026