RAG is the most poorly explained AI architecture of the last three years. Reality: fetch relevant info, paste it into the prompt, LLM answers based on it. Everything else is implementation detail.

Pipeline

  1. Ingest and chunk documents.
  2. Embed chunks into vectors.
  3. Store in a vector database.
  4. At query time, embed the question and fetch top-K.
  5. Send question + chunks to the LLM.

Common pitfalls

Bad chunking, poorly calibrated top-K, no reranking, no eval set.