Descriptive Prompts vs. Keyword Stacks vs. Natural Language

피파 한 줄 정리: Keyword stack은 SD 1.5 시절. FLUX한테 keyword stack 쓰면 T5-XXL 언어 이해 능력을 *낭비*하는 거야. 모델에 맞춰 prompt style을 갈아.

Over the past few years, three distinct prompting styles have emerged, and which one works best depends heavily on which model you're using. This is one of the most practical lessons in the entire track — it can save you hours of frustration.

The Three Styles

1. Keyword Stacking (Tag-Based)

Comma-separated descriptors with no grammar. Born from the Stable Diffusion 1.5 / Danbooru era where the CLIP text encoder processed prompts more like search queries than sentences.

Keyword Stack Style

"portrait, woman, red hair, freckles, green eyes, soft lighting, bokeh, 85mm, professional photography, 8k, masterpiece, best quality"

When This Works

SD 1.5 models and fine-tunes trained on tag-based captions. These models were literally trained on comma-separated tags.

2. Descriptive Prompting

Short, structured sentences that describe the scene with moderate detail. A middle ground between keywords and natural language.

Descriptive Style

"Portrait of a woman with red hair and freckles, soft natural lighting, shallow depth of field, professional photography"

When This Works

Most modern models. Clear, efficient, and doesn't waste tokens on grammar that the model might not need.

3. Natural Language (Conversational)

Full sentences that read like a scene description or photography brief. Takes advantage of models with advanced language understanding.

Natural Language Style

"A young woman with vibrant red hair and light freckles looks directly at the camera with a slight, knowing smile. She's lit by soft window light from the left side, creating gentle shadows. The background is softly blurred. Shot on an 85mm lens at f/1.8."

When This Works

FLUX, FLUX.2, and other models with powerful language encoders (T5-XXL, Mistral). These models understand syntax and relationships between concepts.

The Critical Insight: Model Architecture Determines Style

Model	Text Encoder	Best Prompt Style
SD 1.5 / fine-tunes	CLIP (77 tokens)	Keyword stacking
SDXL	CLIP + OpenCLIP	Descriptive
SD 3.5	CLIP + T5-XXL	Descriptive or natural language
FLUX / FLUX.2	T5-XXL / Mistral	Natural language
Midjourney	Proprietary	Short descriptive + parameters
DALL-E 3	GPT-based	Natural language (auto-enhanced)

Prompt Anchoring

Regardless of style, concrete nouns and specific scene descriptions almost always matter more than abstract adjectives. This is called prompt anchoring.

❌ Abstract Adjective Soup

"beautiful, stunning, magnificent, breathtaking, incredible, gorgeous landscape"

✅ Concrete Anchoring

"A glacial lake reflecting snow-capped peaks, morning mist hovering over turquoise water, wildflowers in the foreground, Patagonia"

The abstract version gives the model almost nothing specific to work with — "beautiful" is too vague. The concrete version gives it actual visual targets: glacial lake, snow-capped peaks, turquoise water, wildflowers, Patagonia. Each noun anchors the image to specific learned patterns.

Key Takeaways

Three prompt styles exist: keyword stacking, descriptive, and natural language.
The right style depends on the model's text encoder. FLUX wants natural language; SD 1.5 wants keywords.
Concrete nouns anchor the image far more effectively than abstract adjectives.
Match your prompting style to your model — it's one of the highest-impact, lowest-effort improvements you can make.

Descriptive Prompts vs. Keyword Stacks vs. Natural Language

The Three Styles

The Critical Insight: Model Architecture Determines Style

Prompt Anchoring

External links

Exercise

Progress

댓글 0