Public View
Suggest
Download this page (.md) Download entire wiki (.zip)
Clone entire wiki

Dialogue State Architecture

Dialogue State Architecture uses dialogue acts instead of simple frame filling to perform generation; used currently more in research.
NLU: slot fillers to extract user’s utterance, using ML Dialogue State Tracker: maintains current state of dialogue Dialogue policy: decides what to do next (think GUS’ policy: ask, fill, respond)—but nowaday we have more complex dynamics NLG: respond dialogue acts dialogue acts combines speech-acts with underlying states
slot filing we typically do this with BIO Tagging with a BERT just like NER Tagging, but we tag for frame slots.
the final <cls> token may also work to classify domain + intent.
corrections are hard folks sometimes uses hyperarticulation (“exaggerated prosody”) for correction, which trip up ASR
correction acts may need to be detected explicitly as a speech act:
dialogue policy we can choose over the last frame, agent and user utterances:

\begin{equation} A = \arg\max_{a} P(A|F_{i-1}, A_{i-1}, U_{i-1}) \end{equation}

we can probably use a neural architecture to do this.
whether to confirm via ASR confirm:
&lt;\alpha: reject \geq \alpha: confirm explicitly \geq \beta: confirm implicitly \geq \gamma: no need to confirm NLG once the speech act is determined, we need to actually go generate it: 1) choose some attributes 2) generate utterance
We typically want to delexicalize the keywords (Henry serves French food => [restraunt] serves [cruisine] food), then run through NLG, then rehydrate with frame.

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?