Convergence of distributed symbolic regression using metaheuristics
Creators
Description
Symbolic regression (SR) fits a symbolic expression to a set of expected values. Amongst its advantages over other techniques is the ability for a practitioner to interpret the resulting expression, determine important features by their usage in the expression, and insights into the behavior of the resulting model such as continuity, derivatives and extrema. SR combines a discrete combinatoric problem, combining base functions, with the continuous optimization problem of selecting and mutating real valued constants. One of the main algorithms used in SR is Genetic Programming (GP). The convergence characteristics of SR using GP are still an open issue. The continuous aspect of the problem has traditionally been an issue in GP based symbolic regression. This paper will study convergence of a GP-SR implementation on selected benchmarks known for poor convergence characteristics. We introduce a cooling schedule on the mutation operator and observe the computational savings. The constant optimization problem is studied using a two phase approach. We apply a variation on constant folding and evaluate its effects. The hybridization of GP with 3 metaheuristics (Differential Evolution, Artificial Bee Colony, Particle Swarm Optimization) are evaluated. We use a distributed GP-SR implementation to evaluate the effect of topologies on the convergence characteristics of the algorithm and the difference in communication overhead
and speedup. We introduce and evaluate a topology with the aim of finding a new balance between diffusion and communication and synchronization overhead. We intro-
duce a variation of k-fold cross validation to estimate how accurate a generated solution is in predicting unknown datapoints. This validation technique is implemented in parallel in the algorithm combining both the advantages of cross validation with the increase in coverage of the search space. Our tool offers a wide array of statistics describing the convergence characteristics of the algorithm over time, offering practitioners nuanced insights into the algorithm as it approximates the symbolic regression problem. We combine our incremental support with a design of experiment technique applied on a simulator and evaluate the impact on the convergence characteristics in combination with our constant optimization approach on the one hand and the distributed algorithm on the other hand.
Other
Master Thesis, University of Antwerp, Computer Science.
Files
MsCThesisBenCardoen.pdf
Files
(2.7 MB)
Name | Size | Download all |
---|---|---|
md5:c33d28866d24c86966b881106e028150
|
2.7 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/bencardoen/CSRM.git
- Programming language
- Python
- Development Status
- Active