EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph

1University of Texas at El Paso, TX, USA. 2Purdue University, IN, USA.
Emails: njiang@utep.edu, {wang4538, yexiang}@purdue.edu
Main diagram

Figure 1: Applying the rewrite rule \( \log(a\times b) \leadsto \log(a) + \log(b)\) to an e-graph representing expression \(\log(x_1^3x_2^2)\). (a) The initialized e-graph consists of e-classes (dashed boxes), each containing equivalent e-nodes (solid boxes). Edges connect e-nodes to their child e-classes. (b) The matching step identifies the e-nodes that match the \( \mathtt{LHS} \) of the rule. (c) The substitution step adds new e-classes and edges corresponding to the \( \mathtt{RHS} \) to the e-graph. (d) The merging step consolidates equivalent e-classes. The final e-graph in (d) compactly represents two equivalent expressions.

Abstract

Symbolic regression seeks to uncover physical laws from experimental data by searching for closed-form expressions, which is an important task in AI-driven scientific discovery. Yet the exponential growth of the search space of expression renders the task computationally challenging. A promising yet underexplored direction for reducing the effective search space and accelerating training lies in symbolic equivalence: many expressions, although syntactically different, define the same function - for example, \( \log(x_1^2x_2^3) \), \( \log(x_1^2)+\log(x_2^3) \), and \( 2\log(x_1)+3\log(x_2) \). Existing algorithms treat such variants as distinct outputs, leading to redundant exploration and slow learning. We introduce EGG-SR, a unified framework that integrates equality graphs (e-graphs) into diverse symbolic regression algorithms, including Monte Carlo Tree Search (MCTS), deep reinforcement learning (DRL), and large language models (LLMs). EGG-SR compactly represents equivalent expressions through the proposed EGG module, enabling more efficient learning by: (1) pruning redundant subtree exploration in EGG-MCTS, (2) aggregating rewards across equivalence classes in EGG-DRL, and (3) enriching feedback prompts in EGG-LLM. Under mild assumptions, we show that embedding e-graphs tightens the regret bound of MCTS and reduces the variance of the DRL gradient estimator. Empirically, EGG-SR consistently enhances multiple baselines across challenging benchmarks, discovering equations with lower normalized mean squared error than state-of-the-art methods. Code implementation is available at: github.com/jiangnanhugo/egg-sr.

BibTeX

@article{nan2025,
  title={EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph},
  author={Nan Jiang, Ziyi Wang, Yexiang Xue.},
  journal={Arxiv},
  year={2025},
  url={https://nan-jiang-group.github.io/egg-sr}
}