What You Will Learn
Voltage on a distribution feeder is not constant. It drops as you move further from the substation and it rises when solar panels push power backward. Volt-VAR Optimization (VVO) coordinates capacitor banks, voltage regulators, and smart inverters to keep voltage within the acceptable range while minimizing energy losses. In this guide you will:
- Understand how voltage varies along a feeder and why VVO matters
- Analyze the SP&L network's voltage profile using OpenDSS
- Build a simple rule-based Volt-VAR controller
- Implement a basic Q-learning agent that learns to control capacitor switching
- Compare the rule-based and RL approaches on SP&L feeder data
What is reinforcement learning? Unlike supervised learning (Guide 01–04), reinforcement learning doesn't learn from labeled examples. Instead, an agent takes actions in an environment and receives rewards. Over many episodes, it discovers which actions lead to the best outcomes. Think of it as learning by trial and error, like training a dog with treats.
SP&L Data You Will Use
- network/master.dss — the full OpenDSS model including capacitors and regulators
- network/capacitors.dss — capacitor bank placements and kVAR ratings
- network/regulators.dss — voltage regulator settings and tap positions
- timeseries/substation_load_hourly.parquet — hourly load profiles for time-series simulation
Additional Libraries
pip install opendssdirect.py pyarrow
Which terminal should I use? On Windows, open Anaconda Prompt from the Start Menu (or PowerShell / Command Prompt if Python is already in your PATH). On macOS, open Terminal from Applications → Utilities. On Linux, open your default terminal. All pip install commands work the same across platforms.
OpenDSS on Windows vs. macOS/Linux: On Windows, OpenDSS also has a standalone installer from EPRI (the COM interface), but you do not need it for this guide—pip install opendssdirect.py works on all platforms. If you already have the Windows COM version installed, opendssdirect.py will still work fine alongside it.
import opendssdirect as dss
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
DATA_DIR = "sisyphean-power-and-light/"
dss.Text.Command(f"Compile [{DATA_DIR}network/master.dss]")
dss.Solution.Solve()
coords = pd.read_csv(DATA_DIR + "network/coordinates.csv")
feeder_buses = coords[coords["bus_name"].str.startswith("f03")].sort_values("x")
voltages = []
for _, row in feeder_buses.iterrows():
bus = row["bus_name"]
dss.Circuit.SetActiveBus(bus)
v_pu = dss.Bus.puVmagAngle()[0]
voltages.append({"bus": bus, "distance": row["x"], "voltage_pu": v_pu})
vdf = pd.DataFrame(voltages)
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(vdf["distance"], vdf["voltage_pu"], "o-", color="#5FCCDB", markersize=4)
ax.axhline(y=1.05, color="red", linestyle="--", alpha=0.7, label="Upper limit")
ax.axhline(y=0.95, color="red", linestyle="--", alpha=0.7, label="Lower limit")
ax.axhspan(0.95, 1.05, alpha=0.1, color="green", label="ANSI range")
ax.set_xlabel("Distance from Substation")
ax.set_ylabel("Voltage (p.u.)")
ax.set_title("Voltage Profile Along Feeder F03")
ax.legend()
plt.tight_layout()
plt.show()
Before using ML, build a simple rule: "If the voltage at the end of the feeder drops below 0.97 p.u., switch on a capacitor bank. If it rises above 1.03 p.u., switch it off."
def rule_based_vvo(voltage_pu, cap_on):
"""Simple rule-based capacitor control.
Returns: new cap_on state (True/False)
"""
if voltage_pu < 0.97 and not cap_on:
return True
elif voltage_pu > 1.03 and cap_on:
return False
return cap_on
load_profile = pd.read_parquet(DATA_DIR + "timeseries/substation_load_hourly.parquet")
feeder_load = load_profile[load_profile["feeder_id"] == "F03"].head(24)
cap_on = False
rule_results = []
for i, (_, row) in enumerate(feeder_load.iterrows()):
dss.Text.Command(f"Compile [{DATA_DIR}network/master.dss]")
load_mult = row["total_load_mw"] / feeder_load["total_load_mw"].mean()
dss.Solution.LoadMult = load_mult
if cap_on:
dss.Text.Command("Capacitor.cap_f03.States=[1]")
else:
dss.Text.Command("Capacitor.cap_f03.States=[0]")
dss.Solution.Solve()
dss.Circuit.SetActiveBus("f03_bus_12")
v = dss.Bus.puVmagAngle()[0]
cap_on = rule_based_vvo(v, cap_on)
rule_results.append({"hour": i, "voltage_pu": v, "cap_on": cap_on})
rule_df = pd.DataFrame(rule_results)
print(rule_df)
Now let's set up a reinforcement learning environment. The agent observes the current voltage and decides whether to switch the capacitor on or off. It gets a positive reward when voltage is within the ANSI range and a negative reward (penalty) when it's outside.
class VoltVAREnv:
"""Simple Volt-VAR environment for Q-learning."""
def __init__(self, load_profile, data_dir):
self.load_profile = load_profile
self.data_dir = data_dir
self.hour = 0
self.cap_on = False
def reset(self):
self.hour = 0
self.cap_on = False
return self._get_state()
def _get_state(self):
v = self._solve_voltage()
if v < 0.95: bucket = 0
elif v < 0.97: bucket = 1
elif v < 1.00: bucket = 2
elif v < 1.03: bucket = 3
else: bucket = 4
return (bucket, int(self.cap_on))
def _solve_voltage(self):
dss.Text.Command(f"Compile [{self.data_dir}network/master.dss]")
row = self.load_profile.iloc[self.hour % len(self.load_profile)]
load_mult = row["total_load_mw"] / self.load_profile["total_load_mw"].mean()
dss.Solution.LoadMult = load_mult
if self.cap_on:
dss.Text.Command("Capacitor.cap_f03.States=[1]")
dss.Solution.Solve()
dss.Circuit.SetActiveBus("f03_bus_12")
return dss.Bus.puVmagAngle()[0]
def step(self, action):
self.cap_on = bool(action)
voltage = self._solve_voltage()
if 0.95 <= voltage <= 1.05:
reward = 1.0
else:
reward = -10.0
reward += max(0, 1 - abs(voltage - 1.0) * 20)
self.hour += 1
done = self.hour >= len(self.load_profile)
return self._get_state(), reward, done, {"voltage": voltage}
q_table = np.zeros((5, 2, 2))
alpha = 0.1
gamma = 0.95
epsilon = 1.0
epsilon_min = 0.05
epsilon_decay = 0.995
n_episodes = 100
env = VoltVAREnv(feeder_load, DATA_DIR)
episode_rewards = []
for ep in range(n_episodes):
state = env.reset()
total_reward = 0
while True:
if np.random.random() < epsilon:
action = np.random.randint(2)
else:
action = np.argmax(q_table[state[0], state[1]])
next_state, reward, done, info = env.step(action)
old_q = q_table[state[0], state[1], action]
best_next = np.max(q_table[next_state[0], next_state[1]])
q_table[state[0], state[1], action] = old_q + alpha * (
reward + gamma * best_next - old_q
)
total_reward += reward
state = next_state
if done:
break
epsilon = max(epsilon_min, epsilon * epsilon_decay)
episode_rewards.append(total_reward)
if (ep + 1) % 20 == 0:
print(f"Episode {ep+1:>3}/{n_episodes} "
f"Reward: {total_reward:.1f} Epsilon: {epsilon:.3f}")
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(episode_rewards, color="#5FCCDB", alpha=0.6)
ax.plot(pd.Series(episode_rewards).rolling(10).mean(),
color="#1C4855", linewidth=2, label="10-episode avg")
ax.set_xlabel("Episode")
ax.set_ylabel("Total Reward")
ax.set_title("Q-Learning Training Progress")
ax.legend()
plt.tight_layout()
plt.show()
env = VoltVAREnv(feeder_load, DATA_DIR)
state = env.reset()
rl_results = []
while True:
action = np.argmax(q_table[state[0], state[1]])
state, reward, done, info = env.step(action)
rl_results.append({"hour": env.hour - 1, "voltage_pu": info["voltage"],
"cap_on": bool(action)})
if done:
break
rl_df = pd.DataFrame(rl_results)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5), sharey=True)
ax1.plot(rule_df["hour"], rule_df["voltage_pu"], "o-", color="#2D6A7A")
ax1.axhspan(0.95, 1.05, alpha=0.1, color="green")
ax1.set_title("Rule-Based Controller")
ax1.set_xlabel("Hour")
ax1.set_ylabel("Voltage (p.u.)")
ax2.plot(rl_df["hour"], rl_df["voltage_pu"], "o-", color="#5FCCDB")
ax2.axhspan(0.95, 1.05, alpha=0.1, color="green")
ax2.set_title("Q-Learning Controller")
ax2.set_xlabel("Hour")
plt.suptitle("VVO Controller Comparison")
plt.tight_layout()
plt.show()
- Analyzed the voltage profile along an SP&L distribution feeder
- Built a rule-based Volt-VAR controller with simple threshold logic
- Defined an RL environment that rewards voltage compliance
- Trained a Q-learning agent to learn capacitor switching strategies
- Compared both approaches and visualized the results
Ideas to Try Next
- Control multiple devices: Add voltage regulators and smart inverters to the action space
- Add loss minimization: Reward the agent for reducing total circuit losses (not just voltage compliance)
- Deep RL: Replace the Q-table with a neural network (DQN) for continuous state spaces
- Multi-feeder coordination: Train agents that coordinate across multiple feeders
- Include solar variability: Add PV generation from
timeseries/pv_generation.parquet for realistic cloud transients
Key Terms Glossary
- Volt-VAR Optimization (VVO) — coordinating voltage and reactive power control devices to maintain power quality
- Capacitor bank — a device that injects reactive power to boost voltage
- Voltage regulator — a transformer with adjustable taps that raises or lowers voltage
- Q-learning — a model-free RL algorithm that builds a table of state-action values
- Epsilon-greedy — exploration strategy: take a random action with probability epsilon, best known action otherwise
- Reward shaping — designing the reward function to guide the agent toward desired behavior
Ready to Level Up?
In the advanced guide, you'll build a Deep Q-Network with PyTorch for multi-device voltage control with experience replay and target networks.
Go to Advanced Volt-VAR →