Guide 08: Anomaly detection.

Adam BrownAuthor

Working notebookFormat

SP&L dataDataset

← Back to All Guides

Guide 08

Prefer not to install anything? Click the badge above to open this guide as a runnable notebook in Google Colab. Sign in with any Google account, then use Runtime → Run all to execute every cell, or step through them one at a time.

What You Will Learn

AMI (Advanced Metering Infrastructure) meters generate millions of data points every day. Hidden in that data are anomalies: unusual voltage readings that signal equipment problems, meter tampering, or phase imbalances. In this guide you will:

Load 15-minute AMI voltage data from the SP&L dataset
Explore what "normal" voltage patterns look like
Train an Isolation Forest model to detect anomalies without labeled data
Build a simple autoencoder in PyTorch that learns to reconstruct normal patterns
Flag anomalies based on reconstruction error and evaluate both approaches

What is unsupervised anomaly detection? In Guides 01 and 04, we had labels—we knew which events were outages or failures. But anomaly detection often works without labels. The model learns what "normal" looks like and flags anything that deviates significantly. This is powerful because you don't need to have seen every type of anomaly before—the model catches anything unusual.

SP&L Data You Will Use

customer_interval_data.csv (load_customer_interval_data()) — 15-minute AMI data for ~500 sampled customers with customer_id, transformer_id, feeder_id, substation_id, timestamp, demand_kw, energy_kwh, voltage_v, and power_factor
weather_data.csv (load_weather_data()) — hourly weather for context

Additional Libraries

pip install torch

torch (PyTorch) is used for the autoencoder in the second half. You can complete the Isolation Forest section without it.

Which terminal should I use? On Windows, open Anaconda Prompt from the Start Menu (or PowerShell / Command Prompt if Python is already in your PATH). On macOS, open Terminal from Applications → Utilities. On Linux, open your default terminal. All pip install commands work the same across platforms.

PyTorch on Windows: The command pip install torch installs the CPU-only version, which is all you need for this guide. If you have an NVIDIA GPU and want GPU acceleration, visit pytorch.org/get-started for the platform-specific install command with CUDA support. The CPU version works identically on Windows, macOS, and Linux.

Part A: Isolation Forest

Verify Your Setup

Before starting, verify that your environment is configured correctly. Run this cell first to confirm all dependencies are installed and data files are accessible.

# Step 0: Verify your setup
try:
    import pandas as pd
    import numpy as np
    from demo_data.load_demo_data import load_customer_interval_data
    ami = load_customer_interval_data()
    print(f"Setup OK! Loaded {len(ami):,} AMI records.")
except ModuleNotFoundError as e:
    print(f"Missing library: {e}")
    print("Run: pip install -r requirements.txt")
except FileNotFoundError:
    print("Data files not found. Run from the repo root:")
    print("  cd Dynamic-Network-Model && jupyter lab")
                    

Setup OK! Loaded N records.

Working directory: All guides assume your working directory is the repository root (Dynamic-Network-Model/). Start Jupyter Lab from there: cd Dynamic-Network-Model && jupyter lab

Having trouble? Check our Troubleshooting Guide for solutions to common setup and data loading issues.

Load and Explore AMI Data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from demo_data.load_demo_data import load_customer_interval_data

# Load AMI data
ami = load_customer_interval_data()

print(f"AMI records: {len(ami):,}")
print(f"Columns: {list(ami.columns)}")
print(f"Customers: {ami['customer_id'].nunique()}")
print(ami.head())
                    

Focus on Voltage Readings

# Pick one customer to study in detail
meter = ami[ami["customer_id"] == ami["customer_id"].unique()[0]].copy()
meter["timestamp"] = pd.to_datetime(meter["timestamp"])
meter = meter.sort_values("timestamp")

# Plot voltage over a month
one_month = meter[(meter["timestamp"] >= "2024-06-01") &
                  (meter["timestamp"] < "2024-07-01")]

fig, ax = plt.subplots(figsize=(14, 4))
ax.plot(one_month["timestamp"], one_month["voltage_v"], linewidth=0.5, color="#2D6A7A")
ax.axhline(y=126, color="red", linestyle="--", alpha=0.5, label="ANSI upper (126V)")
ax.axhline(y=114, color="red", linestyle="--", alpha=0.5, label="ANSI lower (114V)")
ax.set_title("AMI Voltage Readings — June 2024")
ax.set_ylabel("Voltage (V)")
ax.legend()
plt.tight_layout()
plt.show()
                    

You should see voltage oscillating in a daily pattern. Occasional spikes or dips are the anomalies we want to detect.

Engineer Features for Anomaly Detection

# Create features from voltage readings across all customers
ami["timestamp"] = pd.to_datetime(ami["timestamp"])
ami["hour"] = ami["timestamp"].dt.hour

# Aggregate to hourly statistics per customer
hourly = ami.groupby(["customer_id", ami["timestamp"].dt.floor("h")]).agg(
    voltage_mean=("voltage_v", "mean"),
    voltage_std=("voltage_v", "std"),
    voltage_min=("voltage_v", "min"),
    voltage_max=("voltage_v", "max"),
    energy_kwh=("energy_kwh", "sum"),
).reset_index()

# Add voltage range (spread) as a feature
hourly["voltage_range"] = hourly["voltage_max"] - hourly["voltage_min"]

# Fill any NaN values
hourly = hourly.fillna(0)

print(f"Hourly feature rows: {len(hourly):,}")
print(hourly.describe())
                    

Train the Isolation Forest

# Select features for anomaly detection
feature_cols = ["voltage_mean", "voltage_std", "voltage_range",
                "voltage_min", "voltage_max", "energy_kwh"]

X = hourly[feature_cols]

# Standardize features (important for distance-based methods)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train Isolation Forest
# contamination = expected % of anomalies (start with 1%)
iso_forest = IsolationForest(
    n_estimators=200,
    contamination=0.01,
    random_state=42
)

iso_forest.fit(X_scaled)
print("Isolation Forest training complete.")

# Predict: -1 = anomaly, 1 = normal
hourly["anomaly"] = iso_forest.predict(X_scaled)
hourly["anomaly_score"] = iso_forest.decision_function(X_scaled)

n_anomalies = (hourly["anomaly"] == -1).sum()
print(f"Anomalies detected: {n_anomalies} ({n_anomalies/len(hourly)*100:.2f}%)")
                    

How does Isolation Forest work? It builds random decision trees that try to isolate each data point. Normal points are similar to many others and take many splits to isolate. Anomalies are rare and different, so they get isolated quickly with fewer splits. The "anomaly score" reflects how easy a point was to isolate.

Visualize the Anomalies

# Plot anomalies on the voltage timeline
anomalies = hourly[hourly["anomaly"] == -1]
normal    = hourly[hourly["anomaly"] == 1]

fig, ax = plt.subplots(figsize=(14, 5))
ax.scatter(normal["voltage_mean"], normal["voltage_std"],
           c="#5FCCDB", s=5, alpha=0.3, label="Normal")
ax.scatter(anomalies["voltage_mean"], anomalies["voltage_std"],
           c="red", s=30, marker="x", label="Anomaly")
ax.set_xlabel("Mean Voltage (V)")
ax.set_ylabel("Voltage Std Dev")
ax.set_title("Isolation Forest: Anomaly Detection in AMI Voltage Data")
ax.legend()
plt.tight_layout()
plt.show()
                    

Part B: Autoencoder (PyTorch)

Build the Autoencoder

An autoencoder is a neural network that learns to compress data into a small representation and then reconstruct it. If the network is trained on normal data, it will reconstruct normal patterns well but struggle with anomalies—producing high reconstruction error.

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

# Define the autoencoder architecture
class VoltageAutoencoder(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        # Encoder: compress from input_dim down to 3
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 16),
            nn.ReLU(),
            nn.Linear(16, 8),
            nn.ReLU(),
            nn.Linear(8, 3),  # bottleneck layer
        )
        # Decoder: reconstruct from 3 back to input_dim
        self.decoder = nn.Sequential(
            nn.Linear(3, 8),
            nn.ReLU(),
            nn.Linear(8, 16),
            nn.ReLU(),
            nn.Linear(16, input_dim),
        )

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

# Create the model
input_dim = len(feature_cols)
model = VoltageAutoencoder(input_dim)
print(model)
                    

What is a bottleneck? The bottleneck layer (size 3) forces the network to compress 6 input features into just 3 numbers. This compression forces the model to learn the most important patterns in the data. When an anomaly comes through, it doesn't fit the learned compression pattern, and the reconstruction will be poor.

Train the Autoencoder on Normal Data

# Use only normal data for training (filter out Isolation Forest anomalies)
normal_data = hourly[hourly["anomaly"] == 1][feature_cols]
normal_scaled = scaler.fit_transform(normal_data)

# Split into train (80%) and validation (20%)
split = int(len(normal_scaled) * 0.8)
train_data = torch.FloatTensor(normal_scaled[:split])
val_data   = torch.FloatTensor(normal_scaled[split:])

# Create data loaders
train_loader = DataLoader(TensorDataset(train_data, train_data),
                          batch_size=64, shuffle=True)

# Training setup
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Train for 50 epochs
losses = []
for epoch in range(50):
    model.train()
    epoch_loss = 0
    for batch_x, batch_y in train_loader:
        output = model(batch_x)
        loss = criterion(output, batch_y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()

    avg_loss = epoch_loss / len(train_loader)
    losses.append(avg_loss)

    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1:>3}/50  Loss: {avg_loss:.6f}")

# Plot training loss
plt.figure(figsize=(8, 4))
plt.plot(losses, color="#5FCCDB")
plt.title("Autoencoder Training Loss")
plt.xlabel("Epoch")
plt.ylabel("MSE Loss")
plt.tight_layout()
plt.show()
                    

Detect Anomalies by Reconstruction Error

# Run ALL data through the autoencoder (normal + anomalous)
model.eval()
all_scaled = scaler.transform(hourly[feature_cols])
all_tensor = torch.FloatTensor(all_scaled)

with torch.no_grad():
    reconstructed = model(all_tensor)
    recon_error = torch.mean((all_tensor - reconstructed) ** 2, dim=1)

hourly["recon_error"] = recon_error.numpy()

# Set threshold at 99th percentile of reconstruction error
threshold = hourly["recon_error"].quantile(0.99)
hourly["ae_anomaly"] = (hourly["recon_error"] > threshold).astype(int)

print(f"Reconstruction error threshold: {threshold:.4f}")
print(f"Autoencoder anomalies: {hourly['ae_anomaly'].sum()}")
                    

Compare Both Methods

# Compare Isolation Forest vs Autoencoder detections
hourly["iso_anomaly"] = (hourly["anomaly"] == -1).astype(int)

both = (hourly["iso_anomaly"] & hourly["ae_anomaly"]).sum()
iso_only = (hourly["iso_anomaly"] & ~hourly["ae_anomaly"]).sum()
ae_only  = (~hourly["iso_anomaly"] & hourly["ae_anomaly"]).sum()

print(f"Flagged by both methods:       {both}")
print(f"Isolation Forest only:         {iso_only}")
print(f"Autoencoder only:              {ae_only}")

# Reconstruction error distribution
fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(hourly["recon_error"], bins=100, color="#5FCCDB", edgecolor="white")
ax.axvline(x=threshold, color="red", linestyle="--",
           label=f"Threshold ({threshold:.4f})")
ax.set_xlabel("Reconstruction Error")
ax.set_ylabel("Frequency")
ax.set_title("Autoencoder Reconstruction Error Distribution")
ax.set_yscale("log")
ax.legend()
plt.tight_layout()
plt.show()
                    

Investigate the Anomalies

# Look at the top anomalies detected by both methods
high_confidence = hourly[(hourly["iso_anomaly"] == 1) & (hourly["ae_anomaly"] == 1)]
high_confidence = high_confidence.sort_values("recon_error", ascending=False)

print("Top 10 highest-confidence anomalies:\n")
print(high_confidence[["customer_id", "timestamp", "voltage_mean",
      "voltage_std", "voltage_range", "recon_error"]].head(10).to_string(index=False))
                    

Anomalies flagged by both methods are the most trustworthy. Look for patterns: are they clustered on specific customers (possible equipment issue), specific times (possible load event), or specific voltages (possible tap changer malfunction)?

✓

What You Built and Next Steps

Loaded and explored 15-minute AMI voltage data from ~500 sampled customers
Engineered hourly statistical features from raw voltage readings
Trained an Isolation Forest for unsupervised anomaly detection
Built and trained a PyTorch autoencoder on normal voltage patterns
Flagged anomalies using reconstruction error thresholds
Compared both methods and investigated high-confidence detections

Ideas to Try Next

Correlate with outages: Check whether detected anomalies preceded actual outage events in the outage history (load_outage_history())
Meter tampering detection: Look for meters with sudden consumption drops but normal voltage (possible bypass)
Phase imbalance detection: Compare voltage patterns across phases to detect phase-level issues
Real-time sliding window: Implement a streaming version that processes data in 1-hour windows
Variational autoencoder: Replace the basic autoencoder with a VAE for probabilistic anomaly scoring

Ready to Level Up?

In the advanced guide, you'll build a Variational Autoencoder for probabilistic anomaly scoring and implement real-time streaming detection.

Go to Advanced Anomaly Detection →

← Prev: DER Scenario Planning All Guides →

— Adam · adam@sgridworks.com