Temperature, Top-P, and Why Neither Knob Is the One You Want Most of the Time


Temperature, Top-P, and Why Neither Knob Is the One You Want Most of the Time

You have a Claude API call that keeps producing outputs you don't like — too repetitive, too wild, too flat. Someone tells you to "try adjusting the temperature." That advice is sometimes right and often a distraction.

The jargon

Token — the smallest chunk Claude processes. Roughly a word or part of a word.

Sampling — how Claude picks the next token. It doesn't pick the single most likely word every time; it draws from a probability distribution, which is what makes outputs feel natural rather than robotic.

Temperature — a multiplier that flattens or sharpens that probability distribution. Higher values make unlikely tokens more competitive. Lower values make the frontrunner dominate.

Top-p (nucleus sampling) — a filter that cuts the token pool down to the smallest set of candidates whose combined probability exceeds p. At top_p: 0.9, Claude only considers tokens until it has 90% of the probability mass covered, then samples from that pool.

The lesson

Temperature and top-p both affect randomness, but randomness is rarely the actual problem. When outputs feel wrong, the cause is almost always the prompt — missing context, ambiguous instructions, or no example of what "good" looks like. Turning the temperature knob when the prompt is the problem is like adjusting the oven temperature when you've used the wrong recipe.

How it works

Both parameters live in the API request body.

{
  "model": "claude-opus-4-5",
  "max_tokens": 1024,
  "temperature": 0.7,
  "top_p": 0.9,
  "messages": [...]
}

Temperature runs from 0 to 1 (Anthropic's range). At 0, Claude is nearly deterministic — it will pick the highest-probability token almost every time. At 1, the distribution is wide open. The default is 1.

Top-p runs from 0 to 1. At 1.0, nothing is filtered out. At 0.1, Claude only considers the very top of the probability mass — a tiny pool of tokens. The Anthropic API default is also 1.

Anthropic's own guidance says to alter one or the other, not both simultaneously. They interact in ways that are hard to reason about, so changing both at once means you can't isolate what's actually helping.

When to reach for it / when not to

Reach for temperature when the task genuinely benefits from variation — brainstorming, generative writing, producing multiple draft options you'll curate. Drop it toward 0 when you need deterministic outputs: structured data extraction, classification, code generation with a precise spec.

Reach for top-p rarely. It's a fine-grained control that mostly matters when you're doing research on model behaviour. In production use, temperature is sufficient.

Don't reach for either when the output is wrong because the instructions are wrong. A clearer task description, a worked example in the prompt, or an explicit output format will fix more problems than any parameter combination.

Try it

Find a Claude API call you use regularly. Run it three times at temperature: 0 and three times at temperature: 1. Compare the six outputs. You'll quickly see what the knob actually controls — and whether the variation you're seeing is a sampling problem or a prompt problem. Usually it's the prompt.


Don't miss what's next. Subscribe to My Claude Daily Learning: