피파 한 줄 정리: Keyword stack은 SD 1.5 시절. FLUX한테 keyword stack 쓰면 T5-XXL 언어 이해 능력을 *낭비*하는 거야. 모델에 맞춰 prompt style을 갈아.
Over the past few years, three distinct prompting styles have emerged, and which one works best depends heavily on which model you're using. This is one of the most practical lessons in the entire track — it can save you hours of frustration.
The Three Styles
1. Keyword Stacking (Tag-Based)
Comma-separated descriptors with no grammar. Born from the Stable Diffusion 1.5 / Danbooru era where the CLIP text encoder processed prompts more like search queries than sentences.
"portrait, woman, red hair, freckles, green eyes, soft lighting, bokeh, 85mm, professional photography, 8k, masterpiece, best quality"
SD 1.5 models and fine-tunes trained on tag-based captions. These models were literally trained on comma-separated tags.
2. Descriptive Prompting
Short, structured sentences that describe the scene with moderate detail. A middle ground between keywords and natural language.
"Portrait of a woman with red hair and freckles, soft natural lighting, shallow depth of field, professional photography"
Most modern models. Clear, efficient, and doesn't waste tokens on grammar that the model might not need.
3. Natural Language (Conversational)
Full sentences that read like a scene description or photography brief. Takes advantage of models with advanced language understanding.
"A young woman with vibrant red hair and light freckles looks directly at the camera with a slight, knowing smile. She's lit by soft window light from the left side, creating gentle shadows. The background is softly blurred. Shot on an 85mm lens at f/1.8."
FLUX, FLUX.2, and other models with powerful language encoders (T5-XXL, Mistral). These models understand syntax and relationships between concepts.
The Critical Insight: Model Architecture Determines Style
| Model | Text Encoder | Best Prompt Style |
|---|---|---|
| SD 1.5 / fine-tunes | CLIP (77 tokens) | Keyword stacking |
| SDXL | CLIP + OpenCLIP | Descriptive |
| SD 3.5 | CLIP + T5-XXL | Descriptive or natural language |
| FLUX / FLUX.2 | T5-XXL / Mistral | Natural language |
| Midjourney | Proprietary | Short descriptive + parameters |
| DALL-E 3 | GPT-based | Natural language (auto-enhanced) |
Prompt Anchoring
Regardless of style, concrete nouns and specific scene descriptions almost always matter more than abstract adjectives. This is called prompt anchoring.
"beautiful, stunning, magnificent, breathtaking, incredible, gorgeous landscape"
"A glacial lake reflecting snow-capped peaks, morning mist hovering over turquoise water, wildflowers in the foreground, Patagonia"
The abstract version gives the model almost nothing specific to work with — "beautiful" is too vague. The concrete version gives it actual visual targets: glacial lake, snow-capped peaks, turquoise water, wildflowers, Patagonia. Each noun anchors the image to specific learned patterns.
- Three prompt styles exist: keyword stacking, descriptive, and natural language.
- The right style depends on the model's text encoder. FLUX wants natural language; SD 1.5 wants keywords.
- Concrete nouns anchor the image far more effectively than abstract adjectives.
- Match your prompting style to your model — it's one of the highest-impact, lowest-effort improvements you can make.