피파 한 줄 정리: 실패 진단 4-카테고리 (prompt / model / control / task)로 분류 → 카테고리별 정해진 fix. Random iteration 대신 systematic diagnosis.
Mental model: When your car won't start, a good mechanic doesn't randomly replace parts. She runs through a diagnostic framework: Is it the battery? The starter motor? The fuel system? The ignition? Each question narrows the problem space. Similarly, when an AI image generation fails, the failure isn't random — it falls into one of a few predictable categories, and identifying which one tells you exactly what to fix.
The Four Failure Categories
Every AI image generation failure can be diagnosed as one of these four types:
Image doesn't look right. Why?
│
├─ 1. PROMPT ISSUE
│ → Your instructions are vague, conflicting, or missing key details
│ → Fix: Rewrite the prompt
│
├─ 2. MODEL LIMITATION
│ → The model can't do what you're asking (text, counting, hands, etc.)
│ → Fix: Use a different model or post-processing
│
├─ 3. CONTROL ISSUE
│ → You need more precision than text alone can provide
│ → Fix: Use references, ControlNet, inpainting, or manual editing
│
└─ 4. TASK MISMATCH
→ You're using an image generator for a task that needs a different tool
→ Fix: Switch to the right tool (vector editor, 3D tool, code, etc.)
Category 1: Prompt Issues
This is the most common and most fixable category. Signs that your prompt is the problem:
- The image looks fine but doesn't match your intent → underspecification (prompt is too vague)
- The image looks muddy or confused → conflict (contradictory style/content directions)
- The image captures some elements but misses others → attention overload (too many concepts for the model to track)
The fix is always prompt revision: add specificity, remove conflicts, simplify, or restructure.
Category 2: Model Limitations
Sometimes the prompt is perfect but the model simply can't execute it. This is the lesson of this entire track:
- Text is garbled → text rendering limitation (Lesson 1)
- Object count is wrong → counting limitation (Lesson 2)
- Hands are mangled → articulated structure limitation (Lesson 3)
- Spatial arrangement is wrong → spatial relationship limitation (Lesson 4)
- Character looks different → consistency limitation (Lesson 5)
The fix is to switch models (some handle text better), use post-processing, or restructure the workflow.
Category 3: Control Issues
The prompt is good, the model is capable, but text alone doesn't give enough precision:
- You need a specific pose but can't describe it in words → use a pose reference
- You need a specific composition but verbal directions are too vague → use a layout sketch
- You need to fix one part of an otherwise good image → use inpainting
The fix is to add visual control (Track 5 covers this in depth).
Category 4: Task Mismatch
The most important category to recognize early because it saves the most time:
- You need a logo → use a vector design tool
- You need a data chart → use a charting library
- You need a UI mockup → use Figma
- You need pixel-perfect consistency → use 3D rendering
Failures Reveal Model Structure
Here's the deeper insight that separates beginners from skilled practitioners: every failure tells you something about how the model works. Text garbling reveals the token-to-pixel gap. Hand errors reveal the absence of anatomical knowledge. Counting errors reveal soft statistical encoding of numbers. Spatial failures reveal the limitations of cross-attention. Style leakage reveals training data correlations.
When you stop treating failures as random annoyances and start reading them as diagnostic information, you gain a mental model of the system itself. That mental model is what lets you predict, avoid, and work around failures — which is the real skill in generative media.
- Four failure categories: prompt issue, model limitation, control issue, task mismatch.
- Identify the category first, then apply the right fix — don't randomly iterate.
- Task mismatch is the most costly mistake: recognize it early and switch tools.
- Every failure reveals model architecture — learning to read failures is a superpower.
- This diagnostic skill transfers across models and outlasts any specific prompt technique.