Architecture2025-12-08
How to handle long context windows vs. retrieval strategies?
With 1M+ token windows, is RAG dead? Understanding the 'Lost in the Middle' phenomenon.
Is RAG Dead?
Not yet. While context windows are growing (Gemini 1.5 Pro has 1M+), stuffing everything into context has downsides.
Trade-offs
- Cost: Long contexts are expensive to process every time.
- Latency: Time-to-first-token increases with context length.
- Accuracy: Models can struggle to find details buried in the middle of massive contexts ("Lost in the Middle").
Further reading
Related Topics
Context WindowRAGArchitecture