← Back to All Guides
Guide 06

Volt-VAR Optimization with Reinforcement Learning

What You Will Learn

Voltage on a distribution feeder is not constant. It drops as you move further from the substation and it rises when solar panels push power backward. Volt-VAR Optimization (VVO) coordinates capacitor banks, voltage regulators, and smart inverters to keep voltage within the acceptable range while minimizing energy losses. In this guide you will:

  • Understand how voltage varies along a feeder and why VVO matters
  • Analyze the SP&L network's voltage profile using OpenDSS
  • Build a simple rule-based Volt-VAR controller
  • Implement a basic Q-learning agent that learns to control capacitor switching
  • Compare the rule-based and RL approaches on SP&L feeder data

What is reinforcement learning? Unlike supervised learning (Guide 01–04), reinforcement learning doesn't learn from labeled examples. Instead, an agent takes actions in an environment and receives rewards. Over many episodes, it discovers which actions lead to the best outcomes. Think of it as learning by trial and error, like training a dog with treats.

SP&L Data You Will Use

  • network/master.dss — the full OpenDSS model including capacitors and regulators
  • network/capacitors.dss — capacitor bank placements and kVAR ratings
  • network/regulators.dss — voltage regulator settings and tap positions
  • timeseries/substation_load_hourly.parquet — hourly load profiles for time-series simulation

Additional Libraries

pip install opendssdirect.py pyarrow

Which terminal should I use? On Windows, open Anaconda Prompt from the Start Menu (or PowerShell / Command Prompt if Python is already in your PATH). On macOS, open Terminal from Applications → Utilities. On Linux, open your default terminal. All pip install commands work the same across platforms.

OpenDSS on Windows vs. macOS/Linux: On Windows, OpenDSS also has a standalone installer from EPRI (the COM interface), but you do not need it for this guide—pip install opendssdirect.py works on all platforms. If you already have the Windows COM version installed, opendssdirect.py will still work fine alongside it.

1

Analyze the Voltage Profile

import opendssdirect as dss import pandas as pd import numpy as np import matplotlib.pyplot as plt # Point this to your local clone of the SP&L repo # Windows example: "C:/Users/YourName/Documents/sisyphean-power-and-light/" # macOS example: "/Users/YourName/Documents/sisyphean-power-and-light/" # Tip: Python on Windows accepts forward slashes — no backslashes needed DATA_DIR = "sisyphean-power-and-light/" # Load the network model dss.Text.Command(f"Compile [{DATA_DIR}network/master.dss]") dss.Solution.Solve() # Collect voltage at every bus along Feeder F03 coords = pd.read_csv(DATA_DIR + "network/coordinates.csv") feeder_buses = coords[coords["bus_name"].str.startswith("f03")].sort_values("x") voltages = [] for _, row in feeder_buses.iterrows(): bus = row["bus_name"] dss.Circuit.SetActiveBus(bus) v_pu = dss.Bus.puVmagAngle()[0] voltages.append({"bus": bus, "distance": row["x"], "voltage_pu": v_pu}) vdf = pd.DataFrame(voltages) # Plot the voltage profile fig, ax = plt.subplots(figsize=(12, 5)) ax.plot(vdf["distance"], vdf["voltage_pu"], "o-", color="#5FCCDB", markersize=4) ax.axhline(y=1.05, color="red", linestyle="--", alpha=0.7, label="Upper limit") ax.axhline(y=0.95, color="red", linestyle="--", alpha=0.7, label="Lower limit") ax.axhspan(0.95, 1.05, alpha=0.1, color="green", label="ANSI range") ax.set_xlabel("Distance from Substation") ax.set_ylabel("Voltage (p.u.)") ax.set_title("Voltage Profile Along Feeder F03") ax.legend() plt.tight_layout() plt.show()
2

Build a Rule-Based Controller

Before using ML, build a simple rule: "If the voltage at the end of the feeder drops below 0.97 p.u., switch on a capacitor bank. If it rises above 1.03 p.u., switch it off."

def rule_based_vvo(voltage_pu, cap_on): """Simple rule-based capacitor control. Returns: new cap_on state (True/False) """ if voltage_pu < 0.97 and not cap_on: return True # switch ON to boost voltage elif voltage_pu > 1.03 and cap_on: return False # switch OFF to reduce voltage return cap_on # no change # Simulate 24 hours with varying load load_profile = pd.read_parquet(DATA_DIR + "timeseries/substation_load_hourly.parquet") feeder_load = load_profile[load_profile["feeder_id"] == "F03"].head(24) cap_on = False rule_results = [] for i, (_, row) in enumerate(feeder_load.iterrows()): # Set load level in OpenDSS dss.Text.Command(f"Compile [{DATA_DIR}network/master.dss]") load_mult = row["total_load_mw"] / feeder_load["total_load_mw"].mean() dss.Solution.LoadMult = load_mult # Apply capacitor state if cap_on: dss.Text.Command("Capacitor.cap_f03.States=[1]") else: dss.Text.Command("Capacitor.cap_f03.States=[0]") dss.Solution.Solve() # Read voltage at end of feeder dss.Circuit.SetActiveBus("f03_bus_12") v = dss.Bus.puVmagAngle()[0] # Apply rule-based control cap_on = rule_based_vvo(v, cap_on) rule_results.append({"hour": i, "voltage_pu": v, "cap_on": cap_on}) rule_df = pd.DataFrame(rule_results) print(rule_df)
3

Define the RL Environment

Now let's set up a reinforcement learning environment. The agent observes the current voltage and decides whether to switch the capacitor on or off. It gets a positive reward when voltage is within the ANSI range and a negative reward (penalty) when it's outside.

class VoltVAREnv: """Simple Volt-VAR environment for Q-learning.""" def __init__(self, load_profile, data_dir): self.load_profile = load_profile self.data_dir = data_dir self.hour = 0 self.cap_on = False def reset(self): self.hour = 0 self.cap_on = False return self._get_state() def _get_state(self): # Discretize voltage into buckets for Q-table v = self._solve_voltage() if v < 0.95: bucket = 0 # too low elif v < 0.97: bucket = 1 # low-normal elif v < 1.00: bucket = 2 # normal elif v < 1.03: bucket = 3 # high-normal else: bucket = 4 # too high return (bucket, int(self.cap_on)) def _solve_voltage(self): dss.Text.Command(f"Compile [{self.data_dir}network/master.dss]") row = self.load_profile.iloc[self.hour % len(self.load_profile)] load_mult = row["total_load_mw"] / self.load_profile["total_load_mw"].mean() dss.Solution.LoadMult = load_mult if self.cap_on: dss.Text.Command("Capacitor.cap_f03.States=[1]") dss.Solution.Solve() dss.Circuit.SetActiveBus("f03_bus_12") return dss.Bus.puVmagAngle()[0] def step(self, action): # Action 0 = cap OFF, Action 1 = cap ON self.cap_on = bool(action) voltage = self._solve_voltage() # Reward: +1 if within ANSI, -10 if outside if 0.95 <= voltage <= 1.05: reward = 1.0 else: reward = -10.0 # Bonus for staying near 1.0 reward += max(0, 1 - abs(voltage - 1.0) * 20) self.hour += 1 done = self.hour >= len(self.load_profile) return self._get_state(), reward, done, {"voltage": voltage}
4

Train the Q-Learning Agent

# Q-table: states (5 voltage buckets x 2 cap states) x 2 actions q_table = np.zeros((5, 2, 2)) # [voltage_bucket, cap_state, action] # Hyperparameters alpha = 0.1 # learning rate gamma = 0.95 # discount factor epsilon = 1.0 # exploration rate epsilon_min = 0.05 epsilon_decay = 0.995 n_episodes = 100 env = VoltVAREnv(feeder_load, DATA_DIR) episode_rewards = [] for ep in range(n_episodes): state = env.reset() total_reward = 0 while True: # Epsilon-greedy action selection if np.random.random() < epsilon: action = np.random.randint(2) # explore else: action = np.argmax(q_table[state[0], state[1]]) # exploit next_state, reward, done, info = env.step(action) # Q-learning update old_q = q_table[state[0], state[1], action] best_next = np.max(q_table[next_state[0], next_state[1]]) q_table[state[0], state[1], action] = old_q + alpha * ( reward + gamma * best_next - old_q ) total_reward += reward state = next_state if done: break epsilon = max(epsilon_min, epsilon * epsilon_decay) episode_rewards.append(total_reward) if (ep + 1) % 20 == 0: print(f"Episode {ep+1:>3}/{n_episodes} " f"Reward: {total_reward:.1f} Epsilon: {epsilon:.3f}")
5

Test and Compare Both Approaches

# Plot training progress fig, ax = plt.subplots(figsize=(10, 4)) ax.plot(episode_rewards, color="#5FCCDB", alpha=0.6) ax.plot(pd.Series(episode_rewards).rolling(10).mean(), color="#1C4855", linewidth=2, label="10-episode avg") ax.set_xlabel("Episode") ax.set_ylabel("Total Reward") ax.set_title("Q-Learning Training Progress") ax.legend() plt.tight_layout() plt.show() # Run the trained agent for one day and compare with rule-based env = VoltVAREnv(feeder_load, DATA_DIR) state = env.reset() rl_results = [] while True: action = np.argmax(q_table[state[0], state[1]]) state, reward, done, info = env.step(action) rl_results.append({"hour": env.hour - 1, "voltage_pu": info["voltage"], "cap_on": bool(action)}) if done: break rl_df = pd.DataFrame(rl_results) # Side-by-side comparison fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5), sharey=True) ax1.plot(rule_df["hour"], rule_df["voltage_pu"], "o-", color="#2D6A7A") ax1.axhspan(0.95, 1.05, alpha=0.1, color="green") ax1.set_title("Rule-Based Controller") ax1.set_xlabel("Hour") ax1.set_ylabel("Voltage (p.u.)") ax2.plot(rl_df["hour"], rl_df["voltage_pu"], "o-", color="#5FCCDB") ax2.axhspan(0.95, 1.05, alpha=0.1, color="green") ax2.set_title("Q-Learning Controller") ax2.set_xlabel("Hour") plt.suptitle("VVO Controller Comparison") plt.tight_layout() plt.show()

What You Built and Next Steps

  1. Analyzed the voltage profile along an SP&L distribution feeder
  2. Built a rule-based Volt-VAR controller with simple threshold logic
  3. Defined an RL environment that rewards voltage compliance
  4. Trained a Q-learning agent to learn capacitor switching strategies
  5. Compared both approaches and visualized the results

Ideas to Try Next

  • Control multiple devices: Add voltage regulators and smart inverters to the action space
  • Add loss minimization: Reward the agent for reducing total circuit losses (not just voltage compliance)
  • Deep RL: Replace the Q-table with a neural network (DQN) for continuous state spaces
  • Multi-feeder coordination: Train agents that coordinate across multiple feeders
  • Include solar variability: Add PV generation from timeseries/pv_generation.parquet for realistic cloud transients

Key Terms Glossary

  • Volt-VAR Optimization (VVO) — coordinating voltage and reactive power control devices to maintain power quality
  • Capacitor bank — a device that injects reactive power to boost voltage
  • Voltage regulator — a transformer with adjustable taps that raises or lowers voltage
  • Q-learning — a model-free RL algorithm that builds a table of state-action values
  • Epsilon-greedy — exploration strategy: take a random action with probability epsilon, best known action otherwise
  • Reward shaping — designing the reward function to guide the agent toward desired behavior

Ready to Level Up?

In the advanced guide, you'll build a Deep Q-Network with PyTorch for multi-device voltage control with experience replay and target networks.

Go to Advanced Volt-VAR →