NLP

Complex System
Coherence Generative REVOLUTION
Why probability maximization sucks Its expensive!
Beam Search Take k candidates Expand k expansions for each of the k candidates Choose the highest probability k candidates k should be small: trying to maximizing
Branch and Bound See Branch and Bound
Challenges of Direct Sampling Direct Sampling sucks. Its sucks. It sucks. Just sampling from the distribution sucks. This has to do with the fact that assigning slightly lower scores “being less confident” is exponentially worse.
The model has to therefore be VERY conservative about giving low confidences; so, it is over confident about worst tokens.
Top-K Top-k is too broad, and top
Nucleaus Sampling Find the smallest set of tokens that make up to p probability.
Correctness The highest probability answer isn’t always right Generative models consider every answer, so we want another model to compute the correct answer Surface Form Competition The Surface Form Competition problem results when top probabity token “steals” probability from the other tokens.
The predicted frequency of a possible string is a main comfounder. And so we can use models to decompose their own predictions:
Turns out:
P(answer|question) \approx P(answer\ is\ valid)P(answer|domain)
So…

\begin{equation} P(answer\ is\ valid) = \frac{P(answer|question)}{P(answer|domain)} \end{equation}

This is better :point_up:. Futher reading: (Holtzman et al. 2021)
Domain Domain is the context in which that the text may occur.
Coverage Why aren’t models controllable
Hallucination Language models predict what’s most likely We hope to control them with natural-language semantics In-Context Learning If we show the model some context which has example input output pairs, it can output. (model are few shot learners)
Correct Scoring We can reverse the output to predict the input to prevent model from loosing information, and use that to rerank the info. Of course, if the model can’t generate the desired input, the output is probably missing information.
Smaller models can be made better because of info reranking.
Th Degenerative Discriminative Gap.
Future Work The fact that the single comma shift the input. What we need is a language to control language behavior.
The Ability to Control a Model are the Goal of Understand the Model
We should only claim to understand a model when we can make a theory map about it: “when X is fed into the model, we get Y”
So: we should look at what the model is biased about (Surface Form Competition, for instance) we would be closer to prime behaviors such that they mimic the human behavior (in pieces, not just “complete these tokens”) in completion We see success as the actual evaluation metrics; we can use machines vs. other machines as the the results Questions ahai@uw.edu
Marcel Just
anthropic ai papers
percy liang