grube.ai

How AI Agents Actually Work

AI agents are not magic. They're five concepts stacked on top of each other, and once you get them you can build any agent from scratch.

These are the five layers. Each one builds on the last.

A stateless text generator

Text in, text out. No memory between calls. Each request is completely independent. The model has no idea what you asked five seconds ago.

Prompt
LLM
Response

LLMs are stateless

Start here, because everything else depends on this.

An LLM takes text in and generates text out. That's it. It has no memory. It doesn't “know” what you asked three messages ago unless you paste the entire conversation history back into every single call.

prompt("What is the capital of France?")
→ "Paris"

prompt("What did I just ask?")
→ "I don't have context about previous questions."

This seems limiting. It is. But it also makes the architecture simple: every agent capability is something you build on top of this stateless core. No hidden state to fight with.

System prompts turn a general model into a specialist

A system prompt sits before every user message. It shapes how the model responds and what role it plays. Same model, completely different behavior.

You are a Python tutor.
Never give direct answers.
Ask guiding questions instead.
User

How do I reverse a list in Python?

Model

Good question! Before I answer, what do you already know about list methods? Have you tried looking at what happens when you call .sort() on a list?

Same model, same question, different system prompt

General model + system prompt = specialist agent. A translation agent, a code reviewer, a math tutor: same model, different text in front. Your system prompt is the single biggest lever you have. Vague instructions produce vague results.

Tool calling

Without tools, an LLM can only talk. It can describe how to multiply numbers. It can explain APIs. But it can't actually do anything.

Function calling fixes that.

User
LLM
Tool Call
Execute
Answer

You send the prompt + tools to the API

curl api.anthropic.com/v1/messages -d '{
  "messages": [{ "role": "user", "content": "What is 15 × 8?" }],
  "tools": [{ "name": "calculator", "input_schema": { ... } }]
}'
1 / 5

The LLM never executes code. It says what it wants to call and with what arguments, your code does the actual work.

The agent loop

One tool call is useful but limited. Real problems need multiple steps. So you put it in a loop.

async function agentLoop(prompt, tools, maxSteps = 10) {
  let step = 0
  while (step < maxSteps) {
    step++
    const response = await llm.generate(prompt, { tools })
    if (response.type === "tool_call") {
      const result = await execute(response.tool, response.params)
      prompt += `\nResult: ${result}`
    } else {
      return response.text
    }
  }
}

This is called the ReAct pattern (Reasoning + Acting). The LLM thinks about what it needs, calls a tool, looks at the result, then decides what to do next. It keeps going until it has enough to answer.

Step through a real example. Watch how the agent breaks a word problem into individual tool calls:

ReAct Agent Trace
1 / 10
Problem

A coffee shop sells 3 drinks. A latte costs $5, a cappuccino costs $4, and a drip coffee costs $3. Yesterday they sold 40 lattes, 25 cappuccinos, and 60 drip coffees. What was the total revenue?

Most production agents work this way. A while loop with an LLM inside calling functions.

Where it gets complicated is in the details: how many iterations to allow, how to handle tool errors, when to bail out. But the core pattern fits in ten lines of code.

Memory

Everything so far happens inside a single conversation. The agent has no idea who you are between sessions. Memory fixes that, but it's less obvious than it sounds.

The simplest version is what you saw earlier: stuff facts into the system prompt. But that's a static hack. Real memory is a read-write system. The agent doesn't just read stored context, it decides what to write, what to update, and what to delete.

const systemPrompt = `You are a helpful assistant.

=== MEMORY ===
- User's name is Alex
- Prefers Python over JavaScript
- Currently learning about databases`

The hard part is not storing memories. It's deciding what deserves to be stored, when to update stale facts, and when to forget. Storing everything leads to noise, and noise degrades retrieval quality over time. Every token spent on memory is a token not available for reasoning.

There's a lot more to memory than fits here. Different memory types (episodic, semantic, procedural), storage architectures, retrieval strategies, the forgetting problem. It deserves its own deep dive.

Memory in AI Agents

How different memory types work, how they complement each other, and how to build your own.

Read article →

That's the foundation. Production agents layer more on top, but underneath it's the same thing. Text in, text out, a loop with some function calls, and a place to write things down between sessions.