Take X-Rays and generate clinical reports. Method encoder decoder architectures Encoder ConViT: convolutional vision transformer. Special thing: we swap out the attention Double Weighted Multi-Head Attention We want to force the model to focus on one thing, so we modulate the model based on the weights of other: if one head is big, we make the other head small. where w_{\cos i} = \frac{\sum_{i}^{} \cos \left(att_{a}, att_{base}\right)}{N}

\begin{equation} w = w_{a} \cdot (1- w_{\cos i}) \end{equation}

meaning:

\begin{equation} att_{dwma} = w \cdot att \end{equation}

Decoding Goood ol’ Hierarchical-Decoder

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?