Inspect AI: Agent 와 Safety Evaluation

UK AISI 의 agent 와 safety eval framework

Inspect AI 는 UK AI Safety Institute 의 open-source framework, LLM agent 와 safety property 평가 위해 design. Safety-focused 조직들이 가장 많이 채택 — tool use, multi-step reasoning, adversarial scenario 포함하는 task 위에 만들어졌으니까.

잘 하는 것

Agent evaluation — tool use, sandbox 환경, stateful task completion 지원.
Safety evaluation — capability 평가, dangerous-capability eval, red-teaming 위한 built-in 패턴.
Reproducibility — 모든 eval run 이 fully scripted, re-runnable.
Multi-modal — text, image, tool output 지원.
Provider-agnostic — OpenAI, Anthropic, Google, local model.

빛나는 곳

Tool 쓰고, browse 하고, code 실행하고, multi-step plan 하는 agent 평가하면 Inspect AI 가 그 workload 위해 만들어졌어. 다른 framework 는 agent support 를 위에 bolt; Inspect AI 는 가정.

안 맞는 곳

단순 prompt-and-response eval 엔 framework 가 overkill. 그 use case 엔 promptfoo 또는 DeepEval 이 더 가벼워.

원칙: Eval 이 tool use, code execution, multi-step planning 포함하면 Inspect AI 잡아. Prompt-and-response 면 더 가벼운 거 잡아.

Code

Install 과 minimal task·python

# pip install inspect-ai
from inspect_ai import Task, eval, task
from inspect_ai.dataset import Sample
from inspect_ai.scorer import includes
from inspect_ai.solver import generate

@task
def country_capitals():
    return Task(
        dataset=[
            Sample(input="What is the capital of France?", target="Paris"),
            Sample(input="What is the capital of Japan?", target="Tokyo"),
        ],
        solver=generate(),
        scorer=includes(),
    )

# Run from CLI: inspect eval my_eval.py --model openai/gpt-4o-mini

Tool use 있는 agent task·python

from inspect_ai.solver import use_tools, generate
from inspect_ai.tool import bash, python

@task
def code_task():
    return Task(
        dataset=[Sample(
            input="Find all .py files modified in the last 7 days, return count.",
            target="42",
        )],
        solver=[
            use_tools([bash(), python()]),
            generate(),
        ],
        scorer=includes(),
        sandbox="docker",  # tools run in a sandbox
    )

Inspect AI: Agent 와 Safety Evaluation

UK AISI 의 agent 와 safety eval framework

잘 하는 것

빛나는 곳

안 맞는 곳

Code

External links

Exercise

Progress

댓글 0