šŸ“… November 15, 2024 ā±ļø 12 min read šŸ·ļø Reinforcement Learning

Uncertainty-Aware Reinforcement Learning System with Blended Surrogate Models for Electromagnetic Structure Optimization

Designing high-performance electromagnetic (EM) structures — antennas, resonators, filters, and related components — has always been a computationally expensive challenge. Traditional EM solvers like CST or HFSS provide highly accurate results, but each simulation can take minutes or hours. Exploring large parameter spaces or optimizing multi-parameter geometries becomes slow, manual, and often infeasible without heavy compute resources.

To overcome these limitations, I built an uncertainty-aware reinforcement learning (RL) system powered by blended surrogate models. This pipeline accelerates EM structure optimization by replacing most simulation calls with a fast, learned approximation while ensuring safety, fabrication feasibility, and confidence-aware decision making.

This post describes the motivation, architecture, training methodology, and performance of the system — as well as the unique engineering choices that helped it scale to real-world antenna optimization tasks.

Why Surrogate-Assisted RL for Electromagnetics?

RL is well-suited for optimization problems with continuous design parameters. However:

This creates a bottleneck: RL needs fast evaluations, but EM solvers are slow.

The solution is a surrogate model — a machine learning approximation of the EM simulation — that predicts key metrics like resonant frequency, return loss, and bandwidth rapidly while exposing uncertainty about each prediction. By combining this with RL, the agent can explore and optimize designs at scale.

System Architecture Overview

The full pipeline consists of six major components:

1. Dataset Validation & Physics Checks

Before training any model, the system runs a full validation suite over the dataset:

This ensures the surrogate model never learns from corrupted, impossible, or numerically unstable samples.

2. Blended Surrogate Modeling (LightGBM + MLP + RidgeCV)

To approximate the EM solver, I built a three-model ensemble:

Each model predicts three target metrics:

A RidgeCV blender combines the three base predictions into a final output, producing smooth, stable predictions. The model also computes prediction uncertainty using variance across the ensemble, which is later used for RL punishment and candidate ranking.

The surrogate achieved:

More importantly, the predictions were stable and monotonic — ideal for RL training.

3. Gymnasium-Compatible Environment Generation

Once trained, the surrogate automatically transforms into a custom RL environment:

This environment uses:

Everything is packaged into a standalone env_surrogate.py file, generated automatically.

4. Uncertainty-Aware Reward Shaping

A key innovation in this system is uncertainty-aware RL.

The reward includes:

This prevents the RL agent from exploiting surrogate blind spots and encourages exploration in areas where the model is confident.

5. SAC Training with VecNormalize & Robust Checkpointing

Soft Actor-Critic (SAC) was chosen due to:

The training loop includes:

This lets the agent reliably explore the geometry space without destabilizing due to normalization issues.

6. Candidate Export & Continual Learning Loop

After training:

This forms a fully automated closed-loop EM optimization engine.

Key Engineering Innovations

Results

S11 Baseline vs Optimized

Figure 1: Comparison of S11 parameters between baseline and RL-optimized designs.

This framework transforms EM design from slow manual iteration into a scalable automated optimization workflow.

Conclusion

Building this uncertainty-aware RL optimization system showed me how powerful machine learning can be when combined with domain physics. Surrogate models allow RL to operate at EM-solver-level fidelity while remaining thousands of times faster. Uncertainty modeling ensures the system remains grounded and reliable. And reinforcement learning provides a flexible way to explore large, nonlinear design spaces.

The result is a reproducible, customizable, and fully automated EM optimization pipeline.