PEFT is parameter efficient fine-tuning. LoRA Consider some matrix:
\begin{equation} W_0 \in \mathbb{R}^{d \times k} \end{equation}
Key intuition: gradient matricies have low intrinsic rank. We consider the following update:
\begin{equation} W_0 + \Delta W = W_0 + \alpha BA \end{equation}
where B \in \mathbb{R}^{d \times r}, A \in \mathbb{R}^{r \times k}, and r \ll \min(d,k). where \alpha is the trade off between pre-trained knowledge and task specific knowledge.