Guide 06: Volt-VAR Optimization - ML Playground

What You Will Learn

Voltage on a distribution feeder is not constant. It drops as you move further from the substation and it rises when solar panels push power backward. Volt-VAR Optimization (VVO) coordinates capacitor banks, voltage regulators, and smart inverters to keep voltage within the acceptable range while minimizing energy losses. In this guide you will:

Understand how voltage varies along a feeder and why VVO matters
Analyze the SP&L network's voltage profile using OpenDSS
Build a simple rule-based Volt-VAR controller
Implement a basic Q-learning agent that learns to control capacitor switching
Compare the rule-based and RL approaches on SP&L feeder data

What is reinforcement learning? Unlike supervised learning (Guide 01–04), reinforcement learning doesn't learn from labeled examples. Instead, an agent takes actions in an environment and receives rewards. Over many episodes, it discovers which actions lead to the best outcomes. Think of it as learning by trial and error, like training a dog with treats.

SP&L Data You Will Use

network/master.dss — the full OpenDSS model including capacitors and regulators
network/capacitors.dss — capacitor bank placements and kVAR ratings
network/regulators.dss — voltage regulator settings and tap positions
timeseries/substation_load_hourly.parquet — hourly load profiles for time-series simulation

Additional Libraries

pip install opendssdirect.py pyarrow

Which terminal should I use? On Windows, open Anaconda Prompt from the Start Menu (or PowerShell / Command Prompt if Python is already in your PATH). On macOS, open Terminal from Applications → Utilities. On Linux, open your default terminal. All pip install commands work the same across platforms.

OpenDSS on Windows vs. macOS/Linux: On Windows, OpenDSS also has a standalone installer from EPRI (the COM interface), but you do not need it for this guide—pip install opendssdirect.py works on all platforms. If you already have the Windows COM version installed, opendssdirect.py will still work fine alongside it.

Analyze the Voltage Profile

import opendssdirect as dss
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Point this to your local clone of the SP&L repo
# Windows example: "C:/Users/YourName/Documents/sisyphean-power-and-light/"
# macOS example:   "/Users/YourName/Documents/sisyphean-power-and-light/"
# Tip: Python on Windows accepts forward slashes — no backslashes needed
DATA_DIR = "sisyphean-power-and-light/"

# Load the network model
dss.Text.Command(f"Compile [{DATA_DIR}network/master.dss]")
dss.Solution.Solve()

# Collect voltage at every bus along Feeder F03
coords = pd.read_csv(DATA_DIR + "network/coordinates.csv")
feeder_buses = coords[coords["bus_name"].str.startswith("f03")].sort_values("x")

voltages = []
for _, row in feeder_buses.iterrows():
    bus = row["bus_name"]
    dss.Circuit.SetActiveBus(bus)
    v_pu = dss.Bus.puVmagAngle()[0]
    voltages.append({"bus": bus, "distance": row["x"], "voltage_pu": v_pu})

vdf = pd.DataFrame(voltages)

# Plot the voltage profile
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(vdf["distance"], vdf["voltage_pu"], "o-", color="#5FCCDB", markersize=4)
ax.axhline(y=1.05, color="red", linestyle="--", alpha=0.7, label="Upper limit")
ax.axhline(y=0.95, color="red", linestyle="--", alpha=0.7, label="Lower limit")
ax.axhspan(0.95, 1.05, alpha=0.1, color="green", label="ANSI range")
ax.set_xlabel("Distance from Substation")
ax.set_ylabel("Voltage (p.u.)")
ax.set_title("Voltage Profile Along Feeder F03")
ax.legend()
plt.tight_layout()
plt.show()
                    

Build a Rule-Based Controller

Before using ML, build a simple rule: "If the voltage at the end of the feeder drops below 0.97 p.u., switch on a capacitor bank. If it rises above 1.03 p.u., switch it off."

def rule_based_vvo(voltage_pu, cap_on):
    """Simple rule-based capacitor control.
    Returns: new cap_on state (True/False)
    """
    if voltage_pu < 0.97 and not cap_on:
        return True   # switch ON to boost voltage
    elif voltage_pu > 1.03 and cap_on:
        return False  # switch OFF to reduce voltage
    return cap_on          # no change

# Simulate 24 hours with varying load
load_profile = pd.read_parquet(DATA_DIR + "timeseries/substation_load_hourly.parquet")
feeder_load = load_profile[load_profile["feeder_id"] == "F03"].head(24)

cap_on = False
rule_results = []

for i, (_, row) in enumerate(feeder_load.iterrows()):
    # Set load level in OpenDSS
    dss.Text.Command(f"Compile [{DATA_DIR}network/master.dss]")
    load_mult = row["total_load_mw"] / feeder_load["total_load_mw"].mean()
    dss.Solution.LoadMult = load_mult

    # Apply capacitor state
    if cap_on:
        dss.Text.Command("Capacitor.cap_f03.States=[1]")
    else:
        dss.Text.Command("Capacitor.cap_f03.States=[0]")

    dss.Solution.Solve()

    # Read voltage at end of feeder
    dss.Circuit.SetActiveBus("f03_bus_12")
    v = dss.Bus.puVmagAngle()[0]

    # Apply rule-based control
    cap_on = rule_based_vvo(v, cap_on)

    rule_results.append({"hour": i, "voltage_pu": v, "cap_on": cap_on})

rule_df = pd.DataFrame(rule_results)
print(rule_df)
                    

Define the RL Environment

Now let's set up a reinforcement learning environment. The agent observes the current voltage and decides whether to switch the capacitor on or off. It gets a positive reward when voltage is within the ANSI range and a negative reward (penalty) when it's outside.

class VoltVAREnv:
    """Simple Volt-VAR environment for Q-learning."""

    def __init__(self, load_profile, data_dir):
        self.load_profile = load_profile
        self.data_dir = data_dir
        self.hour = 0
        self.cap_on = False

    def reset(self):
        self.hour = 0
        self.cap_on = False
        return self._get_state()

    def _get_state(self):
        # Discretize voltage into buckets for Q-table
        v = self._solve_voltage()
        if v < 0.95:   bucket = 0  # too low
        elif v < 0.97: bucket = 1  # low-normal
        elif v < 1.00: bucket = 2  # normal
        elif v < 1.03: bucket = 3  # high-normal
        else:          bucket = 4  # too high
        return (bucket, int(self.cap_on))

    def _solve_voltage(self):
        dss.Text.Command(f"Compile [{self.data_dir}network/master.dss]")
        row = self.load_profile.iloc[self.hour % len(self.load_profile)]
        load_mult = row["total_load_mw"] / self.load_profile["total_load_mw"].mean()
        dss.Solution.LoadMult = load_mult
        if self.cap_on:
            dss.Text.Command("Capacitor.cap_f03.States=[1]")
        dss.Solution.Solve()
        dss.Circuit.SetActiveBus("f03_bus_12")
        return dss.Bus.puVmagAngle()[0]

    def step(self, action):
        # Action 0 = cap OFF, Action 1 = cap ON
        self.cap_on = bool(action)
        voltage = self._solve_voltage()

        # Reward: +1 if within ANSI, -10 if outside
        if 0.95 <= voltage <= 1.05:
            reward = 1.0
        else:
            reward = -10.0

        # Bonus for staying near 1.0
        reward += max(0, 1 - abs(voltage - 1.0) * 20)

        self.hour += 1
        done = self.hour >= len(self.load_profile)
        return self._get_state(), reward, done, {"voltage": voltage}
                    

Train the Q-Learning Agent

# Q-table: states (5 voltage buckets x 2 cap states) x 2 actions
q_table = np.zeros((5, 2, 2))  # [voltage_bucket, cap_state, action]

# Hyperparameters
alpha = 0.1      # learning rate
gamma = 0.95     # discount factor
epsilon = 1.0    # exploration rate
epsilon_min = 0.05
epsilon_decay = 0.995
n_episodes = 100

env = VoltVAREnv(feeder_load, DATA_DIR)
episode_rewards = []

for ep in range(n_episodes):
    state = env.reset()
    total_reward = 0

    while True:
        # Epsilon-greedy action selection
        if np.random.random() < epsilon:
            action = np.random.randint(2)  # explore
        else:
            action = np.argmax(q_table[state[0], state[1]])  # exploit

        next_state, reward, done, info = env.step(action)

        # Q-learning update
        old_q = q_table[state[0], state[1], action]
        best_next = np.max(q_table[next_state[0], next_state[1]])
        q_table[state[0], state[1], action] = old_q + alpha * (
            reward + gamma * best_next - old_q
        )

        total_reward += reward
        state = next_state

        if done:
            break

    epsilon = max(epsilon_min, epsilon * epsilon_decay)
    episode_rewards.append(total_reward)

    if (ep + 1) % 20 == 0:
        print(f"Episode {ep+1:>3}/{n_episodes}  "
              f"Reward: {total_reward:.1f}  Epsilon: {epsilon:.3f}")
                    

Test and Compare Both Approaches

# Plot training progress
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(episode_rewards, color="#5FCCDB", alpha=0.6)
ax.plot(pd.Series(episode_rewards).rolling(10).mean(),
       color="#1C4855", linewidth=2, label="10-episode avg")
ax.set_xlabel("Episode")
ax.set_ylabel("Total Reward")
ax.set_title("Q-Learning Training Progress")
ax.legend()
plt.tight_layout()
plt.show()

# Run the trained agent for one day and compare with rule-based
env = VoltVAREnv(feeder_load, DATA_DIR)
state = env.reset()
rl_results = []

while True:
    action = np.argmax(q_table[state[0], state[1]])
    state, reward, done, info = env.step(action)
    rl_results.append({"hour": env.hour - 1, "voltage_pu": info["voltage"],
                       "cap_on": bool(action)})
    if done:
        break

rl_df = pd.DataFrame(rl_results)

# Side-by-side comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5), sharey=True)

ax1.plot(rule_df["hour"], rule_df["voltage_pu"], "o-", color="#2D6A7A")
ax1.axhspan(0.95, 1.05, alpha=0.1, color="green")
ax1.set_title("Rule-Based Controller")
ax1.set_xlabel("Hour")
ax1.set_ylabel("Voltage (p.u.)")

ax2.plot(rl_df["hour"], rl_df["voltage_pu"], "o-", color="#5FCCDB")
ax2.axhspan(0.95, 1.05, alpha=0.1, color="green")
ax2.set_title("Q-Learning Controller")
ax2.set_xlabel("Hour")

plt.suptitle("VVO Controller Comparison")
plt.tight_layout()
plt.show()
                    

✓

What You Built and Next Steps

Analyzed the voltage profile along an SP&L distribution feeder
Built a rule-based Volt-VAR controller with simple threshold logic
Defined an RL environment that rewards voltage compliance
Trained a Q-learning agent to learn capacitor switching strategies
Compared both approaches and visualized the results

Ideas to Try Next

Control multiple devices: Add voltage regulators and smart inverters to the action space
Add loss minimization: Reward the agent for reducing total circuit losses (not just voltage compliance)
Deep RL: Replace the Q-table with a neural network (DQN) for continuous state spaces
Multi-feeder coordination: Train agents that coordinate across multiple feeders
Include solar variability: Add PV generation from timeseries/pv_generation.parquet for realistic cloud transients

Key Terms Glossary

Volt-VAR Optimization (VVO) — coordinating voltage and reactive power control devices to maintain power quality
Capacitor bank — a device that injects reactive power to boost voltage
Voltage regulator — a transformer with adjustable taps that raises or lowers voltage
Q-learning — a model-free RL algorithm that builds a table of state-action values
Epsilon-greedy — exploration strategy: take a random action with probability epsilon, best known action otherwise
Reward shaping — designing the reward function to guide the agent toward desired behavior

Ready to Level Up?

In the advanced guide, you'll build a Deep Q-Network with PyTorch for multi-device voltage control with experience replay and target networks.

Go to Advanced Volt-VAR →

← Prev: FLISR & Restoration Next: DER Scenario Planning →

Volt-VAR Optimization with Reinforcement Learning

What You Will Learn

SP&L Data You Will Use

Additional Libraries

Analyze the Voltage Profile

Build a Rule-Based Controller

Define the RL Environment

Train the Q-Learning Agent

Test and Compare Both Approaches

What You Built and Next Steps

Ideas to Try Next

Key Terms Glossary

Ready to Level Up?