Files API와 Batch API: async-first 워크플로우

~16 min · files-api, batch-api, async-workflows

Level 0Observer

0 XP0/64 lessons0/13 achievements

0/150 XP to next level150 XP to go0% complete

Files API: 한 번 업로드, 여러 번 참조

Files API는 PDF, 이미지, 다른 문서 한 번 업로드하고 file_id 받아서 매 호출마다 바이트 재업로드 X, messages.create()에서 참조. 같은 문서 세트에 대한 long-lived 분석 세션에 유용.

Batch API: 50% 할인, 24시간 SLA

Batch API는 독립 요청 리스트를 비동기 실행. 거래: 24시간 완료 SLA(보통 훨씬 빠름) 대신 per-token 가격 ~50% 할인. 평가, 대량 분류, 콘텐츠 backfill, 실시간 응답 필요 없는 모든 거에 완벽.

둘 다 안 맞을 때

인터랙티브 user한테 실시간 스트리밍 필요하면 둘 다 안 도움. 답이 내일 나와도 되는 100 evals 돌리면 batch가 비용에서 이김. 같은 50MB PDF가 이번 주 100 챗 턴을 먹이면 files가 첫 호출 후 매 턴 업로드 지연 0으로 줄여.

원칙: 실시간엔 비용 있어. Async 워크플로우는 그걸 되찾는 곳.

Code

PDF 한 번 업로드하고 턴마다 참조·python

from anthropic import Anthropic

client = Anthropic()

# 1. 업로드 (한 번)
uploaded = client.beta.files.upload(
    file=("contract.pdf", open("/path/to/contract.pdf", "rb"), "application/pdf"),
)
print("file id:", uploaded.id)

# 2. 후속 호출에서 file_id로 참조 (재업로드 X)
resp = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "document", "source": {"type": "file", "file_id": uploaded.id}},
                {"type": "text", "text": "Summarize section 5."},
            ],
        }
    ],
    betas=["files-api-2025-04-14"],
)

100 분류를 반값에 batch·python

# Submit
batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": f"row-{i}",
            "params": {
                "model": "claude-haiku-4-5-20251001",
                "max_tokens": 32,
                "system": "Reply with one word: positive, negative, neutral.",
                "messages": [{"role": "user", "content": text}],
            },
        }
        for i, text in enumerate(corpus)
    ]
)
print("batch id:", batch.id, "status:", batch.processing_status)

# Poll (프로덕션이면 webhook이나 scheduled check 사용)
import time
while True:
    b = client.messages.batches.retrieve(batch.id)
    if b.processing_status == "ended":
        break
    time.sleep(30)

# 결과 stream — 요청당 한 JSON 라인
for item in client.messages.batches.results(batch.id):
    print(item.custom_id, item.result.message.content[0].text)

External links

Exercise

Bulk 작업 하나(eval run, 일일 분류, 데이터셋 annotation)를 Batch API로 옮겨. 비용 차이랑 24시간 SLA 대비 실제 완료 시간 측정.

Hint

대부분 batch가 한 시간 안에 끝나 — 하지만 낙관적 케이스에 비즈니스 안 막히게 24시간 계획.

Progress

Progress is local-only — sign in to sync across devices.

← Previous진짜 달러 절약을 위한 prompt caching Next →퀴즈 · 4 questions

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.