ICLR2025 Mathur: MIND Adaptive Thinking with Dynamic Computation

Motivation Standard computation doesn’t adapt. Fixed-Point Iteration for Adaptation method: CNN for every layer, perform fixed-point iteration until convergence to mask out (what exactly?) supervise also an “introspection model” to skip the entire fixed point loss: LM + supervision for the introspection model method: MIND-transformer for every layer, perform fixed-point iteration until attention activation convergence ditto introspection as above