SU-CS238V JAN072025 — Jemoka Knowledge Base

Alignment Problem autonomous systems will do exactly what we tell them to do… so we need to give them good instructions. This is the Alignment Problem imperfect objective—underspecified objective imperfect model—understanding of the world is underspecified imperfect optimization—the model just didn’t solve the problem correctly Validation Framework High level structure: validation_algorithm(system, spec) system environment: state of the world, T(s’|s,a) sensor, O(o|s) agent, policy \pi\left(a | o\right) example: inverted pendulum state: \left(\theta, \omega\right) of the pendulum observation: O(o|s) = \mathcal{N}\left(o|s,\Sigma\right), Gaussian noise policy: consider the following proportional controller policy \begin{equation} \pi \left(a | o\right) = \begin{cases} 1, \text{if} a = -15 \tau - 8 \omega \ 0 \end{cases} \end{equation} environment: a T(s’|s,a) given by physics specification \psi Rules of the system—“do not let the pendulum tip over”. Specifications are usually given in formal specification language such as Linear Temporal Logic or Signal Temporal Logic. validation algorithm With input a system + specification, a Validation Algorithm provides an output in the form of one of… Failure Analysis Falsification: search for failures of a particular system Failure distribution: identify what are the distirbutions of failures Failure probability estimation: estimate the probability of failures Formal Methods “reachability” Linear System reachability Nonlinear system reachability Discrete system reachability Other stuff Explanation Runtime Assurances