Prediction, Not Understanding

피파 한 줄 정리: 이거 한 줄이 트랙 1의 핵심: **모델은 *예측*해, *이해*하는 게 아니야**. 이걸 prompt 짤 때마다 까먹으면 'model이 내 의도를 이해 못 해'라고 잘못된 진단을 내리게 돼.

Here's the most important mental shift in this entire course: these models predict plausible outputs — they don't understand the world.

Think of a weather forecaster who has memorized every weather pattern for the last century. When she says "tomorrow will be sunny," she's not controlling the weather or understanding atmospheric physics at a molecular level. She's recognizing that today's conditions closely match historical patterns that were followed by sunny days. She's making a sophisticated prediction based on learned correlations.

Image models work exactly this way. When you type "a golden retriever playing fetch on a beach at sunset," the model doesn't think: "Okay, a golden retriever is a dog breed with this bone structure, fur is affected by wind and moisture, the sun at this angle creates these shadows..." Instead, it essentially says: "Given everything I've learned about images paired with similar text, what would a plausible image look like?"

Correlation vs. Causation in Action

This distinction has real consequences:

What you think happens:          What actually happens:

"Draw 3 apples"                  "Draw 3 apples"
    ↓                                ↓
Model counts: 1, 2, 3           Model predicts: "images with
    ↓                            'three apples' text usually
Draws exactly 3 apples          have this many round objects"
    ↓                                ↓
✅ Always works                  Sometimes 2, sometimes 4 🤷

The model has learned the correlation between the text "three apples" and images containing roughly three apple-like objects. But it hasn't learned the concept of counting. This is why you sometimes get two apples or four. It's not stupid — it's doing exactly what it was designed to do: predicting a plausible visual pattern. Counting is just not what that pattern prediction reliably captures.

Why This Matters for You

Once you stop expecting "understanding" and start expecting "prediction," everything becomes clearer:

Prompt failures make sense: The model isn't ignoring you — your words didn't reliably activate the patterns you wanted.
Inconsistency is expected: Predictions from statistical patterns naturally vary — that's why the same prompt gives different results each time.
Strengths make sense: The model is great at things where visual patterns are consistent (faces, landscapes, common compositions) and weak where patterns are sparse or irregular (precise text, counting, novel combinations).
Control strategies change: You stop trying to "explain" things to the model and start learning which words and patterns reliably trigger which visual outputs.

Key Takeaways

Generative models predict plausible outputs from learned correlations — they don't understand concepts.
Failures like wrong finger counts or misspelled text reveal the limits of pattern prediction.
Stop expecting "understanding" and start thinking in terms of "which patterns does my input activate?"
This reframe transforms how you prompt, diagnose failures, and build workflows.

Correlation vs. Causation in Action

Why This Matters for You

External links

Exercise

Progress

댓글 0