What is RAG (Retrieval Augmented Generation)?
RAG lets AI look things up instead of making stuff up. Here's how it actually works and why you should care.
Community Top Picks
A leaderboard showing which courses developers have upvoted.
You know that feeling when you ask ChatGPT about something specific, and it just... makes things up? I watched it invent a JavaScript library last month that sounded perfect for my project. Spent twenty minutes trying to install it before I realized it didn't exist.
That's not a bug. That's AI doing what AI does – generating text based on patterns, even when it has no clue what the real answer is.
So here's what RAG does: it stops the AI from winging it. Before answering your question, it actually looks things up in documents you give it. Wild concept, right?
Here's the thing about GPT-4 and friends – they're trained on huge amounts of data, but that data's got an expiration date. It doesn't know about your company's internal docs. It can't see last week's product update. It definitely hasn't read your API documentation.
So what happens when you ask about these things? The model guesses. And sometimes those guesses sound really, really convincing.
I've seen customer support bots confidently cite refund policies that don't exist. Documentation assistants that hallucinate API endpoints. Research tools that reference papers that were never written.
When you're building something that needs to be right – not just sound right – you've got a problem.
Think about:
"Close enough" doesn't cut it here. You need facts, not confident-sounding fiction.
Remember taking tests in school? There were two kinds: closed book (memorize everything) and open book (bring your notes).
Traditional LLMs are closed book tests. They answer from memory, which means they're guessing if they don't know.
RAG is the open book version. When someone asks a question, the AI flips through your documents, finds the relevant parts, and answers based on what it just read.
Here's what happens behind the scenes:
First, you prep your documents Before anyone asks anything, you process your docs – PDFs, web pages, Notion docs, whatever. You turn them into "embeddings," which is just a fancy way of saying "math that captures meaning." Each chunk of text becomes a list of numbers.
When someone asks a question Their question gets turned into the same kind of math representation.
Then you search You find the documents whose embeddings are closest to the question's embedding. You're matching meaning, not just keywords. "How do I reset my password?" matches with your password documentation even if the exact words are different.
Build the prompt You grab those relevant docs and hand them to the AI along with the question.
Get an answer The AI reads the docs you just showed it and generates a response based on what it sees – not what it vaguely remembers from training.
Here's the simplest version in code:
def answer_with_rag(question, knowledge_base):
# Turn the question into searchable math
question_embedding = create_embedding(question)
# Find the 3 most relevant chunks
relevant_docs = search_documents(question_embedding, knowledge_base, top_k=3)
# Give the AI both the question AND the source material
prompt = f"""
Answer this question using ONLY the information below.
If the answer isn't here, say "I don't know."
Information:
{relevant_docs}
Question: {question}
"""
return llm.generate(prompt)That's it. The AI can't make stuff up because it's literally looking at the source material while it answers.
Support at Scale One company I know dumped 500+ support articles into a RAG system. Now when customers ask "What's your refund policy for digital products?", they get the exact policy – word for word from the actual document. No hallucinations. No outdated info. Just facts.
Documentation That Doesn't Suck Instead of ctrl+F-ing through endless API docs, developers just ask: "How do I authenticate with OAuth2 in your Python SDK?" The system pulls the exact code examples and explanations from the real docs. Same info, way faster to find.
Research Without the Slog I talked to a researcher who uploaded 200 papers about a specific drug. She could ask "What side effects showed up in people over 65?" and get a synthesis from all relevant studies. Saved her weeks of manual reading.
Internal Knowledge That Actually Works A friend's company indexed all their Notion docs, Confluence pages, and internal wikis. Now employees ask questions about HR policies or engineering standards and get answers from actual company documents. No generic advice, no guessing.
RAG makes sense when:
Don't bother with RAG when:
Start without RAG. Seriously.
Use a regular LLM first. See what breaks. Maybe hallucination isn't actually your problem. Maybe you don't need this complexity.
But when you hit that moment – when your support bot makes up a policy, when your doc assistant hallucinates an API endpoint, when accuracy actually matters – that's when you add RAG.
You'll know exactly what problem you're solving instead of building infrastructure because it sounds cool.
Here's the thing nobody tells you: RAG isn't perfect. Sometimes the retrieval misses relevant context. Chunking documents can break important info across boundaries. Performance gets slow with massive document sets.
But when you need answers grounded in real documents you control? When "sounds right" isn't good enough and you need "is right"? RAG's the best tool we've got.
Pick one use case. One document set. One problem where hallucination is actually costing you something.
Build the simplest version you can. Three files:
Watch it work (or not work). Fix what breaks. Then decide if you need more.
RAG's not magic. It's just giving AI the ability to look things up instead of guessing. Sometimes that's exactly what you need. Sometimes it's overkill.
You'll figure out which one pretty fast.