What "Generative AI" Means for Images and Video

피파 한 줄 정리: 이미지 AI가 'image database를 검색'하는 거라고 생각하면 처음부터 틀린 거야. 실제로는 학습된 통계 패턴에서 *새로운 픽셀을 합성*하는 거고, 이걸 잡아야 다음 99레슨이 다 말이 돼.

Imagine a jazz musician who has listened to every recording ever made. She doesn't replay songs note-for-note — she improvises new melodies that feel like music because she's absorbed millions of patterns about rhythm, harmony, and phrasing. Generative media AI works the same way: it has "listened to" (trained on) billions of images and videos, and now it synthesizes brand-new pixels, frames, and motion from learned patterns — not by copying, but by composing.

The word generative is doing a lot of work here, so let's unpack it. There are three things generative media AI is not:

Not a search engine. It doesn't find and return photos from a database. It constructs new ones.
Not a collage tool. It doesn't copy-paste pieces from training images. It has learned statistical relationships between concepts and visual patterns.
Not a human artist. It doesn't "understand" the scene it's creating. It predicts what a plausible output looks like based on patterns it has learned.

What It Actually Does

At its core, an image generation model takes an input — usually text, but sometimes another image or a combination — and produces a grid of pixels that represents a plausible visual output. A video generation model does the same thing, but across time: it produces a sequence of frames that look coherent when played together.

Text Prompt ──▶ [ Generative Model ] ──▶ New Image (pixels)
   "a red fox       🧠 Learned              🖼️ Never existed
    in snow"         patterns                   before

Think of the model as a pattern completion engine. You give it a starting signal (your prompt), and it completes the pattern by generating the most plausible visual output it can. "Plausible" here means "consistent with the billions of image-text pairs the model saw during training."

Why This Matters

Understanding that generation is synthesis from patterns — not retrieval, not collage, not understanding — is the foundation for everything else in this course. Once you internalize this, you'll stop being surprised when the model gives you a "wrong" result. It's not wrong — it's producing the most statistically plausible output given your input and its training. Your job is to learn how to steer that process.

Key Takeaways

Generative AI creates new images/videos from learned patterns — it does not retrieve, collage, or "understand."
The model is a pattern completion engine: you provide a signal, it produces a plausible visual output.
"Plausible" is defined by what the model learned during training on billions of image-text pairs.
This mental model — synthesis, not search — is the key to everything that follows.

What It Actually Does

Why This Matters

External links

Exercise

Progress

댓글 0