Designing high-performance electromagnetic (EM) structures ā antennas, resonators, filters, and related components ā has always been a computationally expensive challenge. Traditional EM solvers like CST or HFSS provide highly accurate results, but each simulation can take minutes or hours. Exploring large parameter spaces or optimizing multi-parameter geometries becomes slow, manual, and often infeasible without heavy compute resources.
To overcome these limitations, I built an uncertainty-aware reinforcement learning (RL) system powered by blended surrogate models. This pipeline accelerates EM structure optimization by replacing most simulation calls with a fast, learned approximation while ensuring safety, fabrication feasibility, and confidence-aware decision making.
This post describes the motivation, architecture, training methodology, and performance of the system ā as well as the unique engineering choices that helped it scale to real-world antenna optimization tasks.
Why Surrogate-Assisted RL for Electromagnetics?
RL is well-suited for optimization problems with continuous design parameters. However:
- RL requires millions of interactions.
- EM simulations are slow and expensive.
- Poor actions can yield invalid geometries or non-physical results.
This creates a bottleneck: RL needs fast evaluations, but EM solvers are slow.
The solution is a surrogate model ā a machine learning approximation of the EM simulation ā that predicts key metrics like resonant frequency, return loss, and bandwidth rapidly while exposing uncertainty about each prediction. By combining this with RL, the agent can explore and optimize designs at scale.
System Architecture Overview
The full pipeline consists of six major components:
1. Dataset Validation & Physics Checks
Before training any model, the system runs a full validation suite over the dataset:
- NaN/Inf checks
- Geometry constraints
- Ground-plane margin checks
- Non-physical RF values
- Duplicate geometry detection
- Outlier detection using robust MAD thresholds
- Automatic cleaning & non-destructive preview generation
This ensures the surrogate model never learns from corrupted, impossible, or numerically unstable samples.
2. Blended Surrogate Modeling (LightGBM + MLP + RidgeCV)
To approximate the EM solver, I built a three-model ensemble:
- LightGBM #1
- LightGBM #2
- Deep MLP Regressor
Each model predicts three target metrics:
- Resonant frequency
- Minimum Sāā
- Fractional bandwidth
A RidgeCV blender combines the three base predictions into a final output, producing smooth, stable predictions. The model also computes prediction uncertainty using variance across the ensemble, which is later used for RL punishment and candidate ranking.
The surrogate achieved:
- fā MAE: ~0.02 GHz
- Sāā MAE: ~0.72 dB
- Bandwidth MAE: ~0.13%
More importantly, the predictions were stable and monotonic ā ideal for RL training.
3. Gymnasium-Compatible Environment Generation
Once trained, the surrogate automatically transforms into a custom RL environment:
- 10-dimensional geometry action space
- Normalized observation vector including predicted metrics
- Fabrication feasibility rules
- Uncertainty-aware penalties
- Termination when design meets EM constraints
This environment uses:
- Configurable reset distributions
- Reward shaping for frequency, Sāā, and bandwidth
- Penalization for high uncertainty
- Constraints for manufacturability (clearances, ratios, minimum trace widths)
Everything is packaged into a standalone env_surrogate.py file, generated automatically.
4. Uncertainty-Aware Reward Shaping
A key innovation in this system is uncertainty-aware RL.
The reward includes:
- Normalized performance reward (frequency accuracy, good Sāā, wide bandwidth)
- Surrogate uncertainty penalty
- Fabrication penalty
- Stabilized baseline subtraction for stable SAC updates
This prevents the RL agent from exploiting surrogate blind spots and encourages exploration in areas where the model is confident.
5. SAC Training with VecNormalize & Robust Checkpointing
Soft Actor-Critic (SAC) was chosen due to:
- Continuous actions
- High-sample efficiency
- Entropy-based exploration
The training loop includes:
- Multi-env parallelism (Subproc or DummyVecEnv)
- VecNormalize with atomic loading/saving
- Replay buffer persistence
- Best-model snapshots
- TensorBoard logging
This lets the agent reliably explore the geometry space without destabilizing due to normalization issues.
6. Candidate Export & Continual Learning Loop
After training:
- The agent generates a large batch of candidate EM structures.
- Candidates are ranked by performance Ć confidence.
- Top designs are exported for CST simulation.
- Verified CST results are added to the trusted dataset.
- The surrogate is retrained ā improving accuracy iteratively.
This forms a fully automated closed-loop EM optimization engine.
Key Engineering Innovations
- Uncertainty-aware optimization: By integrating uncertainty into the reward and ranking logic, the RL agent stays away from unreliable surrogate regions.
- Blended surrogate models: Combining LightGBM, MLP, and RidgeCV delivered strong accuracy and stability across multiple EM targets.
- Physics-informed environment: Constraints ensure the agent never proposes non-fabricable or non-physical geometries.
- Reproducible end-to-end pipeline: Deterministic seeds, fixed CPU threading, and versioned checkpoints ensure experiments remain reproducible.
- Continual learning capability: Each CST simulation expands the trusted dataset, improving the surrogate and future RL performance.
Results
- High-fidelity surrogate predictions enable millions of fast RL interactions.
- SAC converges to stable, fabrication-ready geometries.
- RL-discovered structures match or exceed many human-designed baselines.
- Bandwidth-optimized designs demonstrate the power of blended surrogate-assisted RL.
Figure 1: Comparison of S11 parameters between baseline and RL-optimized designs.
This framework transforms EM design from slow manual iteration into a scalable automated optimization workflow.
Conclusion
Building this uncertainty-aware RL optimization system showed me how powerful machine learning can be when combined with domain physics. Surrogate models allow RL to operate at EM-solver-level fidelity while remaining thousands of times faster. Uncertainty modeling ensures the system remains grounded and reliable. And reinforcement learning provides a flexible way to explore large, nonlinear design spaces.
The result is a reproducible, customizable, and fully automated EM optimization pipeline.