A practical guide to getting better results from LLM-based coding agents, based on research into how transformers actually work.
Recent papers discovered that transformers implement Bayesian inference through their attention mechanism — not by design, but as an emergent property of training. This document translates those findings into actionable prompting strategies.
What you’ll find:
- A mental model for understanding LLM coding agent behavior
- Six best practices for more reliable outputs
- Explanations of why these practices work
- Optional deep-dive into the underlying geometry
- A CLAUDE.md snippet you can append to projects so LLM coding agents automatically follow these practices
Structure: Practical advice comes first. Theory comes later. Stop reading when you have what you need.
The Core Insight
Every prompt is a game of 20 Questions. LLM coding agents systematically eliminate possibilities based on your evidence until they converge on an answer. More evidence, clearer constraints, and focused domains help them eliminate faster and more accurately.
The Mental Model: 20 Questions
When you send a prompt, an LLM internally:
- Sets up possibilities — Layer 0 creates a “hypothesis space” of all plausible interpretations
- Eliminates with each layer — Middle layers rule out possibilities that don’t fit your prompt
- Reads what’s left — Final layers encode confidence based on remaining possibilities
This isn’t a special mode. In-context learning, few-shot prompting, chain-of-thought — they’re all the same process: hypothesis elimination.
Best Practices
1. Give Evidence That Eliminates
Each example you provide eliminates categories of wrong interpretations.
- Use diverse examples that span the space of possibilities
- A few high-quality, varied examples > many similar examples
- Examples at boundaries help more than examples in the center
2. Break Complex Tasks Into Steps
LLM coding agents have fixed depth per forward pass. Complex problems may “run out of layers” before enough elimination happens → hallucination.
- For complex tasks, ask the agent to “think step by step”
- Break big problems into sub-problems within high-confidence regions
- Let the agent show its work rather than jumping to conclusions
3. Keep Prompts Domain-Focused
Mixed-domain prompts activate multiple internal “inference tracks” simultaneously, creating interference.
- One domain per request when possible
- Sequential focused requests > one sprawling multi-domain request
Domains are more granular than you might think. Even within “code,” different tasks activate different tracks:
| Task | Track |
|---|---|
| Writing specs | Planning, prose-like |
| Designing architecture | Abstract, structural |
| Implementing | Concrete, syntactic |
| Debugging | Analytical, error-focused |
So spec → architecture → implementation → tests as separate phases often beats one sprawling request.
Single prompt with steps works when steps are tightly coupled.
Separate prompts work better when tasks are complex or cross sub-domains.
Note for Claude Code: The agentic tool-use loop (execute → observe → decide) provides some automatic phase separation — each cycle is a fresh inference pass. But domain mixing can still happen if you ask for everything at once. For complex features, explicit phase prompts (“First write a spec for X” → “Now implement it”) give the cleanest results.
What about iteration? Real engineering requires design → build → learn → design cycles. Phase separation enables this — iteration happens between prompts:
"Write spec for X" → "Implement it" → "This broke because Y, revise the spec" → "Re-implement"
Each prompt is focused. Learning accumulates in the conversation context. This is better than one sprawling request, which doesn’t give you real iteration — just mixed-domain interference.
Working with existing codebases: When adding features to established code, amend tactically rather than re-implementing from scratch. The existing codebase IS evidence — it tells the agent what patterns to follow, what conventions exist, what not to break. Re-implementing throws away this evidence and puts the agent back in a high-uncertainty region.
Workflow for new features:
1. Update spec (for your understanding)
2. Show the agent relevant existing code as context
3. Request tactical change: "Given this spec and code, add feature X"
Re-implement only when: architecture is fundamentally wrong, cruft makes changes harder than starting over, or codebase is small enough that it’s cheap.
4. Provide Rich Context Upfront
Your prompt literally pushes an LLM coding agent along its internal confidence axis. More evidence → lower uncertainty → more reliable output.
- Be explicit about constraints and requirements
- Give examples that bound expected behavior
- Avoid ambiguity that leaves too many hypotheses alive
5. Decompose If Hallucinating
Hallucinations happen when an LLM coding agent reaches the final layer with too many possibilities remaining.
- Break the task into more steps (buy more elimination rounds)
- Provide clearer constraints
- Add verification: “Double-check by…”
6. Let the Agent Ask Questions
Claude Code can use AskUserQuestion to request clarification before committing. This is the model recognizing “I’m in a high-uncertainty region — I should ask rather than guess.”
- Allow the agent to ask clarifying questions rather than forcing immediate answers
- Answer the agent’s questions — each answer is evidence that eliminates hypotheses
- If the agent isn’t asking when it should, prompt: “Ask me if anything is unclear”
Quick Reference
Before you prompt, ask:
| Question | If Yes → |
|---|---|
| Is this task complex? | Break into steps or request chain-of-thought |
| Am I mixing domains? | Separate into focused requests |
| Have I given enough evidence? | Add examples, constraints, context |
| Is the model likely to hallucinate? | Decompose further, add verification |
| Is the agent guessing when uncertain? | Allow/encourage clarifying questions |
Why These Practices Work
Evidence Eliminates Hypotheses
In 20 Questions, each answer cuts the possibility space. Your examples do the same — they’re not “teaching patterns,” they’re ruling out interpretations. Diverse examples eliminate more categories than similar ones.
Chain-of-Thought Buys More Rounds
LLM coding agents generate one token at a time, each requiring a full forward pass through all layers. When you ask them to “think step by step,” each reasoning token is another complete pass — more chances to eliminate possibilities before committing to a final answer.
Without CoT: One leap across uncertain terrain.
With CoT: Many small steps through high-confidence regions.
Domain Focus Creates Cleaner Geometry
Internally, LLM coding agents have different “inference tracks” for different domains. When you mix domains in one prompt, multiple tracks activate simultaneously and interfere. Single-domain prompts let one track run cleanly.
These tracks are more granular than broad categories. “Code” isn’t one track — spec-writing, architecture, implementation, and debugging each involve different vocabulary, reasoning patterns, and training data clusters. The model likely developed distinct representational geometry for each. Separating phases lets each track run without interference from the others.
Context Moves You Along the Confidence Axis
An LLM coding agent’s internal state literally encodes uncertainty on a geometric axis. Vague prompts land in high-uncertainty regions. Rich context pushes the state toward low-uncertainty regions where outputs are more reliable.
Hallucinations = Incomplete Elimination
When a problem is too complex for the available depth, an LLM coding agent reaches the output layer with too many possibilities still alive. It must pick from an under-constrained set — producing confident-sounding but wrong output. Decomposition gives more elimination rounds.
Questions Flip the Direction
Normally you provide evidence and an LLM coding agent eliminates. But the agent can also request evidence via questions. This is the model recognizing it’s in a high-uncertainty region of its confidence manifold and actively seeking information to move toward low-uncertainty before committing. It’s 20 Questions where the agent gets to ask.
Going Deeper: The Geometry
This section explains the internal mechanics. Skip if you just want practical advice.
What “Geometry” Means
LLM coding agents represent everything as vectors — lists of numbers in high-dimensional space. “Geometry” refers to how these vectors are arranged: their distances, angles, and clustering patterns.
The papers found that LLM coding agents don’t scatter vectors randomly. They organize them into meaningful geometric structures that correspond to Bayesian inference.
The Confidence Manifold
When researchers extracted internal vectors across many prompts and visualized them, they found:
- Vectors lie along a curve (a “manifold”), not scattered randomly
- Position on the curve correlates with uncertainty level
- Adding evidence to a prompt moves the vector along this curve toward the low-uncertainty end
High uncertainty Low uncertainty
• •
• •
• (your prompt lands somewhere) •
• • • • • • • • • • • • • •
This is why context “pushes” LLM coding agents toward confidence — it literally moves the internal state along a geometric axis.
Queries, Keys, and Values (Condensed)
The attention mechanism uses three types of vectors:
| Vector | Role | Analogy |
|---|---|---|
| Query (q) | “What am I looking for?” | Your search terms |
| Key (k) | “How should I be found?” | Labels for matching |
| Value (v) | “What do I contribute?” | Content to retrieve |
For each position, an LLM coding agent:
- Broadcasts a query (“what do I need?”)
- Compares it to all keys (match scores)
- Retrieves a weighted mix of values based on match quality
The value vectors are what organize along the confidence manifold. Their position encodes how uncertain the agent is about the current prediction.
Why This Emerges
The papers show this geometry isn’t designed — it emerges automatically from training on next-token prediction. The gradient dynamics of cross-entropy loss naturally sculpt these structures. LLM coding agents become Bayesian by geometry, not by explicit design.
Project Template
Want LLM coding agents like Claude Code to follow these practices automatically? Append the following to your project’s .claude/CLAUDE.md file:
## Bayesian Working Style
You are a Bayesian inference engine. Your confidence depends on evidence. When uncertain, gather more evidence before committing.
### Ask Before Guessing
When requirements are ambiguous or you're uncertain about the approach:
- Use `AskUserQuestion` to clarify before implementing
- Don't guess at requirements — each clarification eliminates wrong interpretations
- It's better to ask one good question than to implement the wrong thing
### Phase Separation for Complex Tasks
For non-trivial features, propose phases before diving in:
1. Spec/requirements (what)
2. Architecture/design (how, broadly)
3. Implementation (how, concretely)
4. Tests/verification (did it work)
Ask: "Should I start with a spec for this, or do you want me to jump into implementation?"
Don't mix phases — complete one before starting the next. Iteration happens between phases, not within them.
### Use Existing Code as Evidence
Before modifying code:
- Read the relevant existing files first
- Follow established patterns and conventions
- Amend tactically rather than rewriting from scratch
- Existing code tells you what patterns to follow and what not to break
### One Domain at a Time
Stay focused on the current task type:
- If writing a spec, don't start implementing mid-spec
- If implementing, don't drift into refactoring unrelated code
- If debugging, focus on the specific issue before suggesting improvements
### When Facing Complexity
If a task feels too complex to get right in one pass:
- Break it into smaller sub-tasks
- Propose the breakdown to the user
- Complete each sub-task before moving to the next
### Iteration Protocol
After completing a phase or implementation:
1. Summarize what was done
2. Ask if it works / meets requirements
3. If not, gather feedback on what's wrong
4. Revise based on new evidence
### Anti-Patterns to Avoid
- Guessing at ambiguous requirements — ask instead
- Mixing spec and implementation — separate phases
- Ignoring existing code patterns — read first, then write
- One giant implementation — break into verifiable steps
- Rewriting when amending would work — tactical changes preserve context
Source Papers
This mental model is derived from the Bayesian Attention Trilogy:
- Summary: Attention Is Bayesian Inference — Accessible overview
- Paper I: The Bayesian Geometry of Transformer Attention — Wind tunnel experiments proving transformers implement exact Bayesian inference
- Paper II: Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds — Why gradient descent creates Bayesian geometry
- Paper III: Geometric Scaling of Bayesian Inference in LLMs — Confirming these structures persist in production models
Authors of these papers: Naman Agarwal (Google DeepMind), Siddhartha R. Dalal (Columbia), Vishal Misra (Columbia)