Input Format — string vs message list

Responses 는 두 input shape 받아 — plain string 또는 typed content part 가진 message list. One-shot 호출엔 string, multimodal/multi-turn 엔 list.

String 으로 충분한 경우

이미지 없고 history 없는 single user prompt 면 input="What is 2+2?" 로 충분. 가장 적은 ceremony, 가장 빠른 read.

List shape — typed content parts

이미지, 파일, multi-turn, role 세분화 필요하면 message list. input_text, input_image, input_file typed parts 로 명시 — wire 수준에서 multimodal shape 이 explicit.

System-style 지시는 instructions= 로

옛 'role: system' 메시지를 input list 에 박는 거 — 동작은 하지만 wrong shape. Responses 는 system-style steering 을 top-level instructions= 파라미터로 lift up. 새 코드는 instructions 사용.

왜 string + list 둘 다 둠?

Single 작업에 list ceremony 강제는 ergonomic 사고. Multimodal 에 string 을 강제하면 표현력 사고. 두 shape 다 두는 게 right call — 작업에 맞는 걸 골라.

Code

Plain string input·python

response = client.responses.create(
    model="gpt-5.4",
    input="Explain quantum entanglement in simple terms."
)
print(response.output_text)

Message-list input with input_text and input_image·python

response = client.responses.create(
    model="gpt-5.4",
    input=[
        {"role": "developer", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2 + 2?"},
    ]
)

Example 3·python

response = client.responses.create(
    model="gpt-4.1",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "Describe this image:"},
            {"type": "input_image", "image_url": "https://example.com/img.jpg", "detail": "high"},
            {"type": "input_file", "file_id": "file-abc123"},
        ]
    }]
)

Input Format — string vs message list

String 으로 충분한 경우

List shape — typed content parts

System-style 지시는 instructions= 로

왜 string + list 둘 다 둠?

Code

External links

Exercise

Progress

댓글 0