IEEE 1366 reliability indices — SAIDI, SAIFI, CAIDI — are regulatory requirements that aren't going away. Defined in IEEE Std 1366-2022 and required by 35+ state public utility commissions, they serve as annual report cards for regulators, benchmarking tools for the industry, and accountability mechanisms for utilities. Every reliability engineer in the country knows these numbers.
So when someone starts talking about applying Site Reliability Engineering to grid operations, the first question from any experienced utility professional is fair: "How does this fit with what we already report?"
The answer is straightforward. SRE doesn't replace IEEE 1366. Practices adapted from SRE principles provide the continuous operational feedback loop that IEEE 1366 was never designed to deliver. SAIDI and SAIFI are the report card. SRE-inspired operations give you visibility into the grade before it's finalized. One drives daily decisions. The other summarizes annual outcomes.
The problem most utilities face is that they have the report card without the engine. They know last year's SAIDI was 120 minutes (close to the national average of 118.4 minutes excluding major events in 2023, per EIA Electric Power Annual). They don't know, with any confidence, what this year's SAIDI will be until December.
§ 01The temporal hierarchy
SRE and IEEE 1366 aren't competing frameworks. They operate at entirely different timescales, and understanding the temporal hierarchy is what makes them complementary.
| Layer | Timeframe | Purpose | Metrics |
|---|---|---|---|
| SRE operations | Real-time to 30 days | Operational guidance, early warning | SLIs, SLOs, burn rates |
| SRE tactical | 30 to 90 days | Improvement prioritization | Error budget trends, toil analysis |
| SRE strategic | 90 days to 1 year | Investment planning | Reliability forecasting |
| IEEE 1366 | Annual | Regulatory reporting, benchmarking | SAIDI, SAIFI, CAIDI |
The relationship flows in one direction. SRE practices run continuously and produce improved operations. Improved operations produce better IEEE 1366 metrics at the annual filing. Real-time Service Level Indicators predict SAIDI and SAIFI trends months in advance, giving operators the ability to intervene proactively rather than explain poor results after the fact.
Part 03 made the case that traditional N-1/N-2 contingency planning gives you a snapshot of capability at a single point. SRE's continuous measurement fills the gap between snapshots. IEEE 1366 provides the annual summary at the end. Together they form a complete picture: real-time awareness, continuous improvement, standardized reporting.
§ 02How SRE practices drive the indices down
The connection between SRE practices and IEEE 1366 outcomes isn't abstract. Each SRE mechanism maps to specific improvements in specific indices.
The formulas matter. SAIDI measures total outage duration normalized by customers served; SAIFI counts how often customers experience interruptions; CAIDI is the ratio.
| SRE practice | Mechanism | Primary index impact |
|---|---|---|
| FLISR response-time SLO | Faster fault location and automated restoration | SAIDI (duration), CAIDI (restoration time) |
| Automated switching | Reduced manual dispatch time | SAIDI (minutes saved per event) |
| Predictive maintenance | Prevent failures before occurrence | SAIFI (fewer events), SAIDI (fewer outages) |
| Burn-rate alerting | Early warning of degrading reliability | SAIDI / SAIFI (intervene before trend worsens) |
| Fault detection SLI | Faster identification of incipient failures | SAIFI (prevent interruptions entirely) |
| Condition monitoring | Predictive asset replacement | SAIFI (reduce equipment-driven failures) |
| Toil reduction | More time for preventive work | SAIFI (shift reactive to proactive maintenance) |
These aren't theoretical improvements. Utilities that have deployed distribution automation with measurable SLI tracking have documented real results.
- EPB of Chattanooga (DOE Smart Grid Investment Grant): After deploying automated FLISR across its territory (2011-2014), EPB documented approximately 55% SAIDI reduction and 45% SAIFI reduction, with a benefit-to-cost ratio DOE called one of the highest in the SGIG program.
- ComEd (Exelon): Documented 60% CAIDI improvement on automated circuits through its FLISR deployment across northern Illinois, with restoration times dropping from hours to minutes on covered feeders.
- SMUD (Sacramento): Reported 28% SAIDI improvement through distribution automation and smart switching, concentrated on feeders where automation coverage was highest.
None of these require abandoning existing processes. They augment what reliability teams already do by adding measurement, targets, and feedback loops. In most cases they integrate with existing ADMS and OMS platforms rather than replacing them.
§ 03A caution on CAIDI in grid modernization
IEEE Std 1366-2022 includes Annex D, an informative appendix titled "Understanding CAIDI" dedicated entirely to cautioning against oversimplifying this index. The annex opens by noting that "CAIDI is a frequently misunderstood electric distribution reliability index" and warns against treating it at face value as a measure of average interruption duration.
The reason matters for any utility deploying automation. CAIDI equals CMI (customer minutes of interruption) divided by CI (customer interruptions). When FLISR automates restoration for a large number of short-duration faults, it reduces CI dramatically because many customers who would have experienced a sustained interruption now experience a momentary one (or none at all). But the remaining interruptions that require manual crew response tend to be longer and harder to fix. If CMI doesn't drop proportionally to CI, CAIDI goes up even though the utility is objectively performing better.
After FLISR: 400 interruptions avg 120 minutes each (the hard ones are all that's left) → CAIDI = 120 min.
SAIDI dropped from 90 min to 48 min on the same customer base. CAIDI went up. The utility is performing better. The index reads worse.
SAIDI dropped 47% in that example. CAIDI rose 33%. A regulator reading only the CAIDI number would conclude the utility got worse. A regulator reading SAIDI alongside it would see the real story. Utilities deploying automation need to educate both their own leadership and their commission staff on how these indices move together. Otherwise, the reward for shipping automation is a CAIDI headline that doesn't match reality.
§ 04SLIs that predict the indices
The highest-leverage SRE practice for IEEE 1366 improvement is continuous SLI measurement. Service Level Indicators are the raw signals operations teams track in real time. When chosen well, they correlate tightly with SAIDI and SAIFI months before the annual filing closes the books. That lead time is the whole point.
- Fault detection time SLI. Time between fault occurrence and detection by SCADA or sensor correlation. Target: under 60 seconds for 99% of faults.
- Fault location SLI. Time between detection and the dispatch system knowing which segment is faulted. Target: under 60 seconds for automated, under 15 minutes for manual patrol.
- Automated restoration SLI. Percentage of faults resolved by FLISR without crew dispatch. Target: 40% or higher on automated circuits.
- Protection relay test coverage SLI. Percentage of relays tested within the last 12 months. Target: 100% for tier-1, 95% overall.
- DER event response SLI. Percentage of DER frequency and voltage events handled without operator intervention. Target: 95% or higher.
Each of these SLIs has a direct line to SAIDI or SAIFI. Fault detection time improvements cut minutes off every outage. Automated restoration rate reduces both SAIDI and SAIFI. Protection relay test coverage prevents catastrophic failures that would otherwise drive both indices up. Continuous measurement gives you the chance to correct course before December's filing.
§ 05Error budget policy, grid edition
SRE's most useful organizational concept is the error budget policy. You derive the annual error budget from your regulatory target, then spend it with intent. If you overspend it by mid-year, that triggers specific actions: freeze non-essential changes, redirect crews toward preventive work, delay new automation rollouts until reliability trends re-stabilize. If you underspend it, you have slack to push higher-risk improvements that would otherwise be too cautious to attempt.
The policy should be written and reviewed with regulatory affairs before it governs operational decisions. PUC filings that cite an error budget policy demonstrate a discipline beyond "we tried our best"; they show a system for making tradeoffs prudently and transparently. That matters in prudency reviews.
§ 06What regulators actually want
State commissions want three things from their utilities: reliable service, predictable rates, and defensible explanations when things go wrong. IEEE 1366 indices are the reporting artifact, not the goal. A utility that adopts SRE practices delivers all three faster than one that reports the same indices annually and hopes for the best.
Reliable service: documented 20-55% SAIDI reductions from automation tracked against explicit SLOs. Predictable rates: the economics (Part 06 covers this) show SRE as low-capital, fast-payback, system-wide benefit. Defensible explanations: blameless postmortems and continuous SLI measurement give you the causal chain in days instead of months, and the paper trail holds up in a rate case.
§ 07Next in the series
Part 06 puts a dollar sign on everything this article described. A 30-minute SAIDI improvement avoids about $10 million per year on a 1,000 MW utility using a conservative $20,000/MWh value-of-lost-load. A typical SRE program costs about $2 million per year. The math is not complicated. The hard part is believing it, then doing the work.
— Adam · adam@sgridworks.com · Apr 1, 2026