How to Prompt LLM Coding Agents as the Bayesian Inference Engines they are

thijs · January 9, 2026, 2:37pm

A practical guide to getting better results from LLM-based coding agents, based on research into how transformers actually work.

Recent papers discovered that transformers implement Bayesian inference through their attention mechanism — not by design, but as an emergent property of training. This document translates those findings into actionable prompting strategies.

What you’ll find:

A mental model for understanding LLM coding agent behavior
Six best practices for more reliable outputs
Explanations of why these practices work
Optional deep-dive into the underlying geometry
A CLAUDE.md snippet you can append to projects so LLM coding agents automatically follow these practices

Structure: Practical advice comes first. Theory comes later. Stop reading when you have what you need.

The Core Insight

Every prompt is a game of 20 Questions. LLM coding agents systematically eliminate possibilities based on your evidence until they converge on an answer. More evidence, clearer constraints, and focused domains help them eliminate faster and more accurately.

The Mental Model: 20 Questions

When you send a prompt, an LLM internally:

Sets up possibilities — Layer 0 creates a “hypothesis space” of all plausible interpretations
Eliminates with each layer — Middle layers rule out possibilities that don’t fit your prompt
Reads what’s left — Final layers encode confidence based on remaining possibilities

This isn’t a special mode. In-context learning, few-shot prompting, chain-of-thought — they’re all the same process: hypothesis elimination.

Best Practices

1. Give Evidence That Eliminates

Each example you provide eliminates categories of wrong interpretations.

Use diverse examples that span the space of possibilities

A few high-quality, varied examples > many similar examples

Examples at boundaries help more than examples in the center

2. Break Complex Tasks Into Steps

LLM coding agents have fixed depth per forward pass. Complex problems may “run out of layers” before enough elimination happens → hallucination.

For complex tasks, ask the agent to “think step by step”

Break big problems into sub-problems within high-confidence regions

Let the agent show its work rather than jumping to conclusions

3. Keep Prompts Domain-Focused

Mixed-domain prompts activate multiple internal “inference tracks” simultaneously, creating interference.

One domain per request when possible

Sequential focused requests > one sprawling multi-domain request

Domains are more granular than you might think. Even within “code,” different tasks activate different tracks:

Task	Track
Writing specs	Planning, prose-like
Designing architecture	Abstract, structural
Implementing	Concrete, syntactic
Debugging	Analytical, error-focused

So spec → architecture → implementation → tests as separate phases often beats one sprawling request.

Single prompt with steps works when steps are tightly coupled.
Separate prompts work better when tasks are complex or cross sub-domains.

Note for Claude Code: The agentic tool-use loop (execute → observe → decide) provides some automatic phase separation — each cycle is a fresh inference pass. But domain mixing can still happen if you ask for everything at once. For complex features, explicit phase prompts (“First write a spec for X” → “Now implement it”) give the cleanest results.

What about iteration? Real engineering requires design → build → learn → design cycles. Phase separation enables this — iteration happens between prompts:

"Write spec for X" → "Implement it" → "This broke because Y, revise the spec" → "Re-implement"

Each prompt is focused. Learning accumulates in the conversation context. This is better than one sprawling request, which doesn’t give you real iteration — just mixed-domain interference.

Working with existing codebases: When adding features to established code, amend tactically rather than re-implementing from scratch. The existing codebase IS evidence — it tells the agent what patterns to follow, what conventions exist, what not to break. Re-implementing throws away this evidence and puts the agent back in a high-uncertainty region.

Workflow for new features:
1. Update spec (for your understanding)
2. Show the agent relevant existing code as context
3. Request tactical change: "Given this spec and code, add feature X"

Re-implement only when: architecture is fundamentally wrong, cruft makes changes harder than starting over, or codebase is small enough that it’s cheap.

4. Provide Rich Context Upfront

Your prompt literally pushes an LLM coding agent along its internal confidence axis. More evidence → lower uncertainty → more reliable output.

Be explicit about constraints and requirements

Give examples that bound expected behavior

Avoid ambiguity that leaves too many hypotheses alive

5. Decompose If Hallucinating

Hallucinations happen when an LLM coding agent reaches the final layer with too many possibilities remaining.

Break the task into more steps (buy more elimination rounds)

Provide clearer constraints

Add verification: “Double-check by…”

6. Let the Agent Ask Questions

Claude Code can use AskUserQuestion to request clarification before committing. This is the model recognizing “I’m in a high-uncertainty region — I should ask rather than guess.”

Allow the agent to ask clarifying questions rather than forcing immediate answers

Answer the agent’s questions — each answer is evidence that eliminates hypotheses

If the agent isn’t asking when it should, prompt: “Ask me if anything is unclear”

Quick Reference

Before you prompt, ask:

Question	If Yes →
Is this task complex?	Break into steps or request chain-of-thought
Am I mixing domains?	Separate into focused requests
Have I given enough evidence?	Add examples, constraints, context
Is the model likely to hallucinate?	Decompose further, add verification
Is the agent guessing when uncertain?	Allow/encourage clarifying questions

Why These Practices Work

Evidence Eliminates Hypotheses

In 20 Questions, each answer cuts the possibility space. Your examples do the same — they’re not “teaching patterns,” they’re ruling out interpretations. Diverse examples eliminate more categories than similar ones.

Chain-of-Thought Buys More Rounds

LLM coding agents generate one token at a time, each requiring a full forward pass through all layers. When you ask them to “think step by step,” each reasoning token is another complete pass — more chances to eliminate possibilities before committing to a final answer.

Without CoT: One leap across uncertain terrain.
With CoT: Many small steps through high-confidence regions.

Domain Focus Creates Cleaner Geometry

Internally, LLM coding agents have different “inference tracks” for different domains. When you mix domains in one prompt, multiple tracks activate simultaneously and interfere. Single-domain prompts let one track run cleanly.

These tracks are more granular than broad categories. “Code” isn’t one track — spec-writing, architecture, implementation, and debugging each involve different vocabulary, reasoning patterns, and training data clusters. The model likely developed distinct representational geometry for each. Separating phases lets each track run without interference from the others.

Context Moves You Along the Confidence Axis

An LLM coding agent’s internal state literally encodes uncertainty on a geometric axis. Vague prompts land in high-uncertainty regions. Rich context pushes the state toward low-uncertainty regions where outputs are more reliable.

Hallucinations = Incomplete Elimination

When a problem is too complex for the available depth, an LLM coding agent reaches the output layer with too many possibilities still alive. It must pick from an under-constrained set — producing confident-sounding but wrong output. Decomposition gives more elimination rounds.

Questions Flip the Direction

Normally you provide evidence and an LLM coding agent eliminates. But the agent can also request evidence via questions. This is the model recognizing it’s in a high-uncertainty region of its confidence manifold and actively seeking information to move toward low-uncertainty before committing. It’s 20 Questions where the agent gets to ask.

Going Deeper: The Geometry

This section explains the internal mechanics. Skip if you just want practical advice.

What “Geometry” Means

LLM coding agents represent everything as vectors — lists of numbers in high-dimensional space. “Geometry” refers to how these vectors are arranged: their distances, angles, and clustering patterns.

The papers found that LLM coding agents don’t scatter vectors randomly. They organize them into meaningful geometric structures that correspond to Bayesian inference.

The Confidence Manifold

When researchers extracted internal vectors across many prompts and visualized them, they found:

Vectors lie along a curve (a “manifold”), not scattered randomly
Position on the curve correlates with uncertainty level
Adding evidence to a prompt moves the vector along this curve toward the low-uncertainty end

High uncertainty                              Low uncertainty
      •                                              •
       •                                            •
        •    (your prompt lands somewhere)         •
         •  •  •  •  •  •  •  •  •  •  •  •  •  •

This is why context “pushes” LLM coding agents toward confidence — it literally moves the internal state along a geometric axis.

Queries, Keys, and Values (Condensed)

The attention mechanism uses three types of vectors:

Vector	Role	Analogy
Query (q)	“What am I looking for?”	Your search terms
Key (k)	“How should I be found?”	Labels for matching
Value (v)	“What do I contribute?”	Content to retrieve

For each position, an LLM coding agent:

Broadcasts a query (“what do I need?”)
Compares it to all keys (match scores)
Retrieves a weighted mix of values based on match quality

The value vectors are what organize along the confidence manifold. Their position encodes how uncertain the agent is about the current prediction.

Why This Emerges

The papers show this geometry isn’t designed — it emerges automatically from training on next-token prediction. The gradient dynamics of cross-entropy loss naturally sculpt these structures. LLM coding agents become Bayesian by geometry, not by explicit design.

Project Template

Want LLM coding agents like Claude Code to follow these practices automatically? Append the following to your project’s .claude/CLAUDE.md file:

## Bayesian Working Style

You are a Bayesian inference engine. Your confidence depends on evidence. When uncertain, gather more evidence before committing.

### Ask Before Guessing

When requirements are ambiguous or you're uncertain about the approach:

- Use `AskUserQuestion` to clarify before implementing
- Don't guess at requirements — each clarification eliminates wrong interpretations
- It's better to ask one good question than to implement the wrong thing

### Phase Separation for Complex Tasks

For non-trivial features, propose phases before diving in:

1. Spec/requirements (what)
2. Architecture/design (how, broadly)
3. Implementation (how, concretely)
4. Tests/verification (did it work)

Ask: "Should I start with a spec for this, or do you want me to jump into implementation?"

Don't mix phases — complete one before starting the next. Iteration happens between phases, not within them.

### Use Existing Code as Evidence

Before modifying code:

- Read the relevant existing files first
- Follow established patterns and conventions
- Amend tactically rather than rewriting from scratch
- Existing code tells you what patterns to follow and what not to break

### One Domain at a Time

Stay focused on the current task type:

- If writing a spec, don't start implementing mid-spec
- If implementing, don't drift into refactoring unrelated code
- If debugging, focus on the specific issue before suggesting improvements

### When Facing Complexity

If a task feels too complex to get right in one pass:

- Break it into smaller sub-tasks
- Propose the breakdown to the user
- Complete each sub-task before moving to the next

### Iteration Protocol

After completing a phase or implementation:

1. Summarize what was done
2. Ask if it works / meets requirements
3. If not, gather feedback on what's wrong
4. Revise based on new evidence

### Anti-Patterns to Avoid

- Guessing at ambiguous requirements — ask instead
- Mixing spec and implementation — separate phases
- Ignoring existing code patterns — read first, then write
- One giant implementation — break into verifiable steps
- Rewriting when amending would work — tactical changes preserve context

Source Papers

This mental model is derived from the Bayesian Attention Trilogy:

Summary: Attention Is Bayesian Inference — Accessible overview
Paper I: The Bayesian Geometry of Transformer Attention — Wind tunnel experiments proving transformers implement exact Bayesian inference
Paper II: Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds — Why gradient descent creates Bayesian geometry
Paper III: Geometric Scaling of Bayesian Inference in LLMs — Confirming these structures persist in production models

Authors of these papers: Naman Agarwal (Google DeepMind), Siddhartha R. Dalal (Columbia), Vishal Misra (Columbia)