RAG is the most poorly explained AI architecture of the last three years. Reality: fetch relevant info, paste it into the prompt, LLM answers based on it. Everything else is implementation detail.
Pipeline
- Ingest and chunk documents.
- Embed chunks into vectors.
- Store in a vector database.
- At query time, embed the question and fetch top-K.
- Send question + chunks to the LLM.
Common pitfalls
Bad chunking, poorly calibrated top-K, no reranking, no eval set.