index b1ed811..24e21ab 100644
@@ -1,7 +1,9 @@
---
status: raw
tags:
-- search
+- research
+- ml
+- math
title: symbolic regression
type: idea
updated: 2026-04-11
@@ -10,4 +12,8 @@ visibility: public
# symbolic regression
-discovering mathematical expressions from data.
\ No newline at end of file
+symbolic regression is the problem of discovering a mathematical expression that fits a dataset — not just fitting parameters, but finding the functional form itself. instead of saying "the relationship is linear, fit the slope," it asks "what is the relationship?" and outputs something like `f(x) = 3x² + sin(x/2)`. this is scientifically interesting because it can recover interpretable physical laws from data, which is fundamentally different from a neural network that just approximates. PySR is the main modern tool; it uses evolutionary search over expression trees and is surprisingly good at recovering known physics equations.
+
+the research angle is about what kinds of hidden structure can be discovered this way. the interesting applications: finding simplified approximations for complex ML models (symbolic distillation), discovering conservation laws in physical simulations, and compressing expensive neural network components into cheap algebraic forms. there's also a connection to mechanistic interpretability — understanding what a neural network is computing by finding a symbolic approximation of its behavior. the challenge is that the search space is enormous and most symbolic regression methods scale poorly to high-dimensional inputs or complex expressions.
+
+connects to [[llm-physical-intuition|LLM physical intuition]] which asks whether LLMs can reason about physical space — symbolic regression is one tool for probing what structure LLMs have learned about physics. [[eeg-artifact-rejection|EEG artifact rejection]] and [[ppg-biomarker-wearable|PPG biomarker wearable]] are data-driven signal processing problems where discovering interpretable features would be valuable. [[flapping-airplanes|flapping airplanes]] is adjacent as another ML research direction. [[simulink-alternative|simulink alternative]] overlaps on the signal-processing-as-math framing. the deeper connection is to [[idea-extraction-system|core idea extraction]] — both are about finding the abstract structure underneath messy data.
\ No newline at end of file