ICLR2025 Yue: Inference Scaling for Long-Context RAG
“RAG performance can scale almost linearly w.r.t. log inference FLOPs”
Demonstration Based RAG (DRAG) Method Adding demonstrations as k in-context examples.
Prompt: documents, input query, final answer.
Parameters: number of documents, number of in context samples, number of iterations upper bound.
Iterative Demonstration Based RAG (IterDRAG) Method DRAG above, and then the model can generate a new sub-query. The model decides
Parameters: number of documents, number of in context samples, number of iterations upper bound.