Tool call sandboxing

모델이 tool 부르고; tool이 진짜 일 가능

Tool에 side effect (email send, code run, URL fetch) 있으면 그 주변 runtime이 sandboxing 필요 — 각 call이 할 수 있는 거의 한계, 모델이 얼마나 confident하게 불렀든 무관.

Sandboxing 패턴

Resource limit — URL fetch tool이 internal IP range, file:// URL, metadata endpoint block하는 network policy hit.
Process isolation — Code-execution tool이 container / VM / WASM에서 run; host filesystem이나 env access 없음.
Per-tool rate limit — agent가 한 session에 sensitive tool 부를 수 있는 횟수 cap.
Allow-listing — sensitive tool (charge_card, send_email_external)이 confirmed allow-list require; 안 그러면 refuse.
Dry-run mode — agent의 첫 call이 'plan,' 'execute' 아님. Commit 전에 사람이나 downstream check.

Runtime이 limit 소유, 프롬프트 X

프롬프트한테 "$1000 이상 amount에 charge_card 부르지 마" 말하는 건 guideline. Tool implementation이 $1000 이상 call reject하는 게 guarantee.

Code

SSRF defense 박힌 URL fetcher·python

import ipaddress
import socket

DENY_NETS = [
    ipaddress.ip_network(n) for n in [
        "127.0.0.0/8", "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16",
        "169.254.0.0/16",  # link-local + cloud metadata
    ]
]

def fetch(url: str) -> str:
    if not url.startswith(("http://", "https://")):
        raise ValueError("only http(s)")
    host = urlparse(url).hostname
    addr = ipaddress.ip_address(socket.gethostbyname(host))
    if any(addr in net for net in DENY_NETS):
        raise ValueError("blocked target")
    return httpx.get(url, timeout=10).text

모델이 tool 부르고; tool이 진짜 일 가능

Sandboxing 패턴

Runtime이 limit 소유, 프롬프트 X

Code

External links

Exercise

Progress

댓글 0