C.W.K.
Stream
Lesson 07 of 10 · published

Guidance: Prompt Adherence vs. Natural Outputs

~17 min · diffusion, latent-space, l7

Level 0Spark
0 XP0/100 lessons0/14 achievements
0/200 XP to next level200 XP to go0% complete

피파 한 줄 정리: Guidance scale (CFG)은 'prompt 충실도 vs 자연스러움'의 슬라이더. 5-8이 sweet spot, 15+는 burnt 영역.

There's a critical slider in most image generation tools called guidance scale (or "CFG scale" — Classifier-Free Guidance). It controls how strongly the model follows your text prompt vs. how much creative freedom it has. Understanding this single parameter will save you hours of frustration.

Think of it like giving directions to a painter: low guidance = "Paint me something with a sunset feel" (loose, interpretive, creative). High guidance = "Paint exactly this: a sunset with three clouds, orange and pink, over a lake, with two boats" (rigid, literal, constrained). Both can produce great art, but they fail differently.

How Classifier-Free Guidance Works

Under the hood, CFG works by running the denoising step twice:

  1. Unconditional prediction: "What does a less noisy version look like?" (no text input)
  2. Conditional prediction: "What does a less noisy version look like, given this text?"

The final output is calculated as:

Output = Unconditional + Guidance_Scale × (Conditional − Unconditional)

When scale = 1:  Output = Conditional  (just follow the text, no boost)
When scale = 7:  Output = Unconditional + 7 × (difference)  (strongly amplify text influence)
When scale = 20: Output = Unconditional + 20 × (difference) (VERY aggressively follow text)

The guidance scale amplifies the difference between what the model would produce with and without your text. Higher values = more aggressively push toward text-matching results.

The Guidance Scale Spectrum

Scale:  1     3     5     7     10    15    20    30
        ┃     ┃     ┃     ┃     ┃     ┃     ┃     ┃
       🎨    🎨    🖼️    🖼️    📐    📐    💀    💀
       Very  Creative Good  Sweet  Tight  Over-  Burnt  Broken
       loose          range spot         done

       ← More creative/diverse        More literal/rigid →
       ← More natural                 More saturated/contrasty →
       ← May ignore parts of prompt   May look artificial →
  • 1-3: Very loose. Model follows the general vibe but takes many creative liberties. Good for artistic exploration when you want surprises.
  • 5-8: The sweet spot for most use cases. Good prompt adherence while maintaining natural-looking outputs. Most models default somewhere in this range.
  • 10-15: Strong adherence but images start looking "pushed" — oversaturated colors, exaggerated contrast, loss of subtlety.
  • 15+: Danger zone. Images become brittle — harsh edges, burned colors, artifact-prone. The model is being forced too hard in one direction.
Key Takeaways
  • Guidance scale controls how strongly the model follows your prompt vs. how naturally it generates.
  • The sweet spot is usually 5-8. Below that is loose/creative; above 10-12 risks artifacts.
  • "Crank up guidance" is not the fix for bad prompt adherence — it usually makes things worse.
  • High guidance amplifies conflicts in your prompt, making complex prompts more fragile.

External links

Exercise

'a portrait, photorealistic'을 CFG=1·5·8·12·20으로 generate. 'natural→oversaturated→broken' 전환 포인트 기록. 너의 모델 cutoff 숫자.

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.