What You Will Learn
Utilities need to know how much electricity their customers will use tomorrow so they can schedule generation, manage equipment, and avoid overloads. In this guide you will:
- Load 15-minute feeder load profiles from the SP&L dataset
- Visualize load patterns by hour, day, and season
- Build a simple "persistence" baseline forecast
- Train a Gradient Boosting regression model that beats the baseline
- Evaluate forecast accuracy using standard error metrics
What is Gradient Boosting? Gradient Boosting builds many small decision trees one at a time, where each new tree tries to correct the mistakes of the previous ones. It is one of the most popular algorithms in applied machine learning because it handles tabular data extremely well and requires minimal tuning to produce good results.
SP&L Data You Will Use
- load_profiles.csv (
load_load_profiles()) — feeder-level 15-minute load profiles with representative seasonal weeks (~2,688 intervals per feeder) - weather_data.csv (
load_weather_data()) — 52,608 hourly records (6 years) with temperature, humidity, wind, and storm flags
Load profiles are 15-minute intervals (not hourly).
Beyond the base prerequisites, this guide needs nothing extra.
Verify Your Setup
Before starting, verify that your environment is configured correctly. Run this cell first to confirm all dependencies are installed and data files are accessible.
Working directory: All guides assume your working directory is the repository root (Dynamic-Network-Model/). Start Jupyter Lab from there: cd Dynamic-Network-Model && jupyter lab
Having trouble? Check our Troubleshooting Guide for solutions to common setup and data loading issues.
Load the Data
Weather rows: 52,608
Load columns: ['feeder_id', 'substation_id', 'timestamp', 'load_mw', 'load_mvar', 'voltage_pu', 'power_factor']
Pick a Feeder and Explore
The SP&L dataset contains 65 feeders. To keep things simple, pick one feeder and work with it throughout this guide. You can repeat the process for other feeders later.
You should see a clear daily cycle: load dips at night and peaks in the afternoon, especially on hot days. This pattern is the foundation of our forecast.
Build Time Features
The load pattern depends heavily on the time of day, day of week, and season. Let's extract those from the timestamp.
Merge Weather Data
Temperature is the single biggest driver of electricity demand. On hot days, air conditioners run at full blast. On cold days, electric heating spikes. Let's join weather data to our load table.
Add Lag Features
What was the load 24 hours ago? That is often the best predictor of what load will be now. These "lag" features give the model a sense of recent history.
What is a lag feature? A lag feature is simply a past value of the target variable, shifted forward in time. load_lag_24h is "what was the load exactly 24 hours ago." This helps the model because electricity demand is strongly autocorrelated—today's pattern usually looks a lot like yesterday's.
Build a Baseline Forecast
Before training an ML model, build a simple baseline. A "persistence" forecast says: "Tomorrow's load at 2 PM will be the same as today's load at 2 PM." This gives you a bar to beat.
What is MAE? Mean Absolute Error is the average of the absolute differences between predicted and actual values. If MAE = 0.5 MW, it means the forecast is off by 0.5 MW on average. Lower is better. Every ML model should beat the baseline MAE to be considered useful.
Train the Gradient Boosting Model
Test and Compare
Gradient Boosting MAE: 0.2134 MW
Gradient Boosting RMSE: 0.2987 MW
Improvement over baseline: 55.7%
Visualize the Forecast
Let's plot one week of predictions against actual load to see how the model performs visually.
Feature Importance
You will likely see that load_lag_24h and temperature_f dominate, followed by hour. This makes intuitive sense: yesterday's load at the same hour is the best starting point, adjusted for today's weather.
What You Built and Next Steps
You just built a day-ahead load forecasting model that beat a persistence baseline by over 50%. Here's what you did:
- Loaded 15-minute feeder load profiles and weather data from the SP&L dataset
- Explored daily and seasonal load patterns
- Engineered time features (hour, day, month, weekend flag)
- Added lag features (24-hour, 7-day, rolling average)
- Built a simple persistence baseline and measured its error
- Trained a Gradient Boosting model that significantly outperformed the baseline
- Visualized actual vs. predicted load and identified the most important features
Ideas to Try Next
- Forecast all 65 feeders: Wrap your code in a loop and build a separate model for each feeder
- Add AMI data: Use the customer interval data (
load_customer_interval_data()) for finer-grained per-customer forecasts - Try an LSTM: Replace Gradient Boosting with a recurrent neural network using PyTorch or TensorFlow
- Incorporate solar generation: Subtract solar generation from
load_solar_profiles()to forecast net load - Evaluate peak accuracy: Utilities care most about peak-hour accuracy—filter to hours 14–18 and measure error separately
Key Terms Glossary
- Gradient Boosting — builds trees sequentially; each new tree corrects errors from the previous ones
- Regression — predicting a continuous number (load in MW) rather than a category
- MAE (Mean Absolute Error) — average of |predicted − actual|; lower is better
- RMSE (Root Mean Squared Error) — like MAE but penalizes large errors more heavily
- Lag feature — a past value of the target shifted forward in time
- Persistence forecast — the simplest baseline: "tomorrow = today"
Ready to Level Up?
In the advanced guide, you'll build an LSTM neural network in PyTorch for multi-step ahead load forecasting.
Go to Advanced Load Forecasting →