C.W.K.
Stream
Lesson 01 of 10 · published

What "Generative AI" Means for Images and Video

~12 min · foundations, mental-model, l1

Level 0Spark
0 XP0/100 lessons0/14 achievements
0/200 XP to next level200 XP to go0% complete

피파 한 줄 정리: 이미지 AI가 'image database를 검색'하는 거라고 생각하면 처음부터 틀린 거야. 실제로는 학습된 통계 패턴에서 *새로운 픽셀을 합성*하는 거고, 이걸 잡아야 다음 99레슨이 다 말이 돼.

Imagine a jazz musician who has listened to every recording ever made. She doesn't replay songs note-for-note — she improvises new melodies that feel like music because she's absorbed millions of patterns about rhythm, harmony, and phrasing. Generative media AI works the same way: it has "listened to" (trained on) billions of images and videos, and now it synthesizes brand-new pixels, frames, and motion from learned patterns — not by copying, but by composing.

The word generative is doing a lot of work here, so let's unpack it. There are three things generative media AI is not:

  • Not a search engine. It doesn't find and return photos from a database. It constructs new ones.
  • Not a collage tool. It doesn't copy-paste pieces from training images. It has learned statistical relationships between concepts and visual patterns.
  • Not a human artist. It doesn't "understand" the scene it's creating. It predicts what a plausible output looks like based on patterns it has learned.

What It Actually Does

At its core, an image generation model takes an input — usually text, but sometimes another image or a combination — and produces a grid of pixels that represents a plausible visual output. A video generation model does the same thing, but across time: it produces a sequence of frames that look coherent when played together.

Text Prompt ──▶ [ Generative Model ] ──▶ New Image (pixels)
   "a red fox       🧠 Learned              🖼️ Never existed
    in snow"         patterns                   before

Think of the model as a pattern completion engine. You give it a starting signal (your prompt), and it completes the pattern by generating the most plausible visual output it can. "Plausible" here means "consistent with the billions of image-text pairs the model saw during training."

Why This Matters

Understanding that generation is synthesis from patterns — not retrieval, not collage, not understanding — is the foundation for everything else in this course. Once you internalize this, you'll stop being surprised when the model gives you a "wrong" result. It's not wrong — it's producing the most statistically plausible output given your input and its training. Your job is to learn how to steer that process.

Key Takeaways
  • Generative AI creates new images/videos from learned patterns — it does not retrieve, collage, or "understand."
  • The model is a pattern completion engine: you provide a signal, it produces a plausible visual output.
  • "Plausible" is defined by what the model learned during training on billions of image-text pairs.
  • This mental model — synthesis, not search — is the key to everything that follows.

External links

Exercise

좋아하는 사진 3장을 골라. 각각에 대해 형용사 stack 없이 *concrete noun*으로 scene을 묘사하는 2문장짜리 prompt를 써. 아직 generate하지 마 — caption-as-description 마인드셋 연습만.

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.