Caching 전략 — 뭐가 cache 가치

~12 min · production, caching

Level 0수련생

0 XP0/100 lessons0/14 achievements

0/120 XP to next level120 XP to go0% complete

3개 cache, 3개 use

Prompt cache (provider-side) — system prompt, 큰 stable doc, tool definition. Input cost dramatically 줄임.
Response cache (너의 side) — identical input에 deterministic prompt에 (chat에 rare, extraction / classification에 common).
Semantic cache — 들어오는 query를 최근 비슷한 query에 match; similarity 높으면 cached response serve. FAQ-shaped traffic에 가장 유용.

Tradeoff

Prompt cache: cheap, 거의 free, quality risk 없음. Stable prefix에 default on.
Response cache: determinism (temperature 0)과 stable input hashing 필요.
Semantic cache: powerful한데 risky — wrong match가 confidently wrong 답 return. High similarity threshold set, fallback 가져.

Invalidation

Cache가 invalidation story 필요. Prompt cache TTL이 provider-managed (분~시간). Response cache가 데이터 update에 invalidate. Semantic cache가 policy 변경에 invalidate. 변경 require하는 같은 diff에 invalidation put.

Code

Similarity threshold 박힌 semantic cache·python

def semantic_cache_lookup(query: str, threshold: float = 0.95):
    emb = embed(query)
    hit = vector_db.search(emb, k=1)
    if hit and hit[0].score >= threshold:
        return hit[0].cached_response
    return None

result = semantic_cache_lookup(query)
if result is None:
    result = call_model(query)
    vector_db.upsert(embed(query), cached_response=result)

External links

Exercise

한 production endpoint에 prompt cache 추가. 1,000 call에 cost 변화 측정. Semantic caching이 위에 pilot 가치 있는지 결정.

Progress

Progress is local-only — sign in to sync across devices.

← PreviousPrompt versioning — string 아니라 git Next →Prompt와 response logging — leak 안 시키고

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.