Abstract
Symbolic regression seeks to uncover physical laws from experimental data by searching for closed-form expressions, which is an important task in AI-driven scientific discovery. Yet the exponential growth of the search space of expression renders the task computationally challenging. A promising yet underexplored direction for reducing the effective search space and accelerating training lies in symbolic equivalence: many expressions, although syntactically different, define the same function - for example, \( \log(x_1^2x_2^3) \), \( \log(x_1^2)+\log(x_2^3) \), and \( 2\log(x_1)+3\log(x_2) \). Existing algorithms treat such variants as distinct outputs, leading to redundant exploration and slow learning. We introduce EGG-SR, a unified framework that integrates equality graphs (e-graphs) into diverse symbolic regression algorithms, including Monte Carlo Tree Search (MCTS), deep reinforcement learning (DRL), and large language models (LLMs). EGG-SR compactly represents equivalent expressions through the proposed EGG module, enabling more efficient learning by: (1) pruning redundant subtree exploration in EGG-MCTS, (2) aggregating rewards across equivalence classes in EGG-DRL, and (3) enriching feedback prompts in EGG-LLM. Under mild assumptions, we show that embedding e-graphs tightens the regret bound of MCTS and reduces the variance of the DRL gradient estimator. Empirically, EGG-SR consistently enhances multiple baselines across challenging benchmarks, discovering equations with lower normalized mean squared error than state-of-the-art methods. Code implementation is available at: github.com/jiangnanhugo/egg-sr.
Figure 2: Execution pipeline of classic MCTS and our EGG-MCTS.
Figure 3: Framework of classic DRL and our EGG-DRL.
Figure 4: Pipeline of LLM-SR and EGG-LLM.
BibTeX
@article{nan2025,
title={EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph},
author={Nan Jiang, Ziyi Wang, Yexiang Xue.},
journal={Arxiv},
year={2025},
url={https://nan-jiang-group.github.io/egg-sr}
}