Probability Distributions Over Possible Images

피파 한 줄 정리: 같은 prompt 두 번 돌렸는데 다른 이미지가 나오는 게 버그가 아니라 feature인 이유. Prompt는 distribution을 *focus*할 뿐, 한 점을 찍지 않아.

Here's an analogy that will unlock a deeper understanding: imagine a vast art gallery with infinite rooms. Each room contains a slightly different painting — every possible image that could ever exist. Some rooms have beautiful portraits, some have random noise, some have impossible surreal scenes. This infinite gallery represents the space of all possible images.

Now imagine that for a given prompt like "sunset over mountains," only a tiny fraction of those rooms contain images that match. Your generative model doesn't search the gallery and pick one. Instead, it has learned a probability map of the gallery — it knows which rooms are "likely" given a particular prompt and navigates toward high-probability areas.

Space of all possible images:

  . . . . . . . . . . . . . . .    ← Low probability (random noise, etc.)
  . . . ★ ★ . . . . ★ ★ . . .    ← Medium probability
  . . ★ ★ ★ ★ . . ★ ★ ★ ★ . .    ← Higher probability regions
  . . . ★ ★ . . . . ★ ★ . . .
  . . . . . . . . . . . . . . .

  ★ = Images matching "sunset over mountains"
  Each ★ is a slightly different valid interpretation

Why There Are Many Valid Outputs

"Sunset over mountains" could mean:

A photorealistic landscape with the Rockies and pink clouds
A watercolor painting of gentle hills with an orange sky
A drone shot of the Alps with dramatic shadows
A stylized illustration of a single peak silhouette

All of these are valid samples from the probability distribution that corresponds to your prompt. The model doesn't pick "the right one" — there isn't one. It samples from the space of plausible outputs. Each generation is a different random walk through that space.

Prompt as a Lens, Not a Remote Control

Your prompt doesn't select a specific image — it focuses the probability distribution. A vague prompt like "a cat" leaves a huge probability space (millions of valid cat images). A detailed prompt like "a ginger tabby cat sleeping on a navy velvet armchair, afternoon light from a window on the left, 35mm photography, shallow depth of field" narrows that space dramatically — but there are still many valid outputs.

This is why:

The same prompt gives different results each time — different random samples from the same distribution
More specific prompts give more consistent results — smaller probability region = less variation
Seeds fix the randomness — they pin the "dice roll" so you get the same sample every time
You can never force exactly one specific output — the distribution always has some width

Key Takeaways

The model has learned a probability distribution over images — it samples plausible outputs, not "the" correct one.
Your prompt focuses the distribution but doesn't select a single image.
Multiple valid outputs exist for any prompt — variation is a feature, not a bug.
Professional workflows embrace this: generate many, curate the best.

Why There Are Many Valid Outputs

Prompt as a Lens, Not a Remote Control

External links

Exercise

Progress

댓글 0