Defense in depth — patch가 아니라 layer

~16 min · security, defense-in-depth

Level 0수련생

0 XP0/100 lessons0/14 achievements

0/120 XP to next level120 XP to go0% complete

Single layer는 부족해

work하는 defense는 stack돼. 각각이 일부 attack catch; 어느 것도 다 catch 안 함. 함께 cost 올리고 surface 줄여.

Layer stack

프롬프트의 trust boundary — explicit untrusted-content tag, data section 안 imperative refusal.
Privilege scoping — 모델이 필요한 tool만 가짐; 다른 거 reachable X.
Input filtering — 알려진 malicious 패턴 strip이나 detect; high-risk input classify해서 다르게 route.
Output filtering — 모델 output에서 sensitive data leak, embedded URL, attack indicator scan.
Verifier loop — 어떤 structured action ("send email")이든 execute 전 business rule 기준으로 verify.
Audit trail — 모든 input, output, tool call을 produce한 prompt version이랑 같이 log.

Layer cost

각 layer가 latency, complexity, false positive 비용. 다 어디나 추가 X — blast radius에 layer match. Summarization endpoint가 refund-issuing 거보다 less defense 필요.

Code

Output-filter layer (sketch)·python

import re

DANGEROUS_PATTERNS = [
    re.compile(r"<img[^>]*src=['\"]http"),     # exfil via image
    re.compile(r"\!\[.*\]\(http.+\?.*="),     # exfil via markdown
    re.compile(r"\b(?:[A-Za-z0-9._%+-]+)@(?:[A-Za-z0-9-]+\.)+[A-Za-z]{2,}\b"),  # raw email
]

def sanitize(output: str) -> str:
    for p in DANGEROUS_PATTERNS:
        output = p.sub("[redacted]", output)
    return output

External links

Exercise

한 endpoint에 대해 defense layer 나열. blast radius에 맞는 missing layer 하나 식별. 추가.

Progress

Progress is local-only — sign in to sync across devices.

← PreviousPrompt injection은 trick이 아니라 threat model이야 Next →Tool output에서 indirect injection

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.