promptfoo: CLI-First Eval Runner

커맨드라인에서 가장 빠르게 eval 돌리는 법

promptfoo (open-source, 2026 중반 기준 v0.118+) 는 LLM application 평가 위한 CLI 와 TypeScript 라이브러리. 60+ provider, 40+ deterministic assertion 타입, 14 model-graded assertion 타입, declarative YAML config 지원. Python 인프라 안 만들고 빠른 iteration 원할 때 best fit.

기본 제공되는 것

여러 prompt × 여러 provider 를 한 run 에 side-by-side 비교.
Assertion: contains, equals, is-json, contains-json, regex, similar (embedding cosine), llm-rubric, javascript, python.
50+ vulnerability 카테고리 (prompt injection, jailbreak, OWASP LLM Top 10) 의 built-in red-teaming 모듈.
HTML report viewer (promptfoo view) 와 CSV export.
Caching, retry, concurrency control.

빛나는 곳

promptfoo 의 강점은 declarative config: 단일 YAML 이 prompt 4 × model 3 × test case 50 × assertion 6 을 describe 가능. 그런 matrix 코드로 짜면 며칠, YAML 로 분 단위. Prompt-engineering iteration 과 빠른 provider 비교에 탁월.

안 맞는 곳

Eval 이 복잡한 Python pipeline (custom retriever, multi-agent loop, fine-tuning experiment) 안에 살면 YAML model 이 어색해져. DeepEval 또는 hand-rolled harness 가 더 fit.

원칙: promptfoo 는 커맨드라인에서 돌리는 prompt-and-provider matrix eval 의 승자. 옳은 첫 도구이고, 자주 마지막 도구이기도 해.

Code

Install 과 initialize·bash

# Install globally or use npx
npm install -g promptfoo
# or
npx promptfoo@latest init

# Scaffold a project
promptfoo init

# Run the suite
promptfoo eval

# Open the HTML viewer
promptfoo view

promptfooconfig.yaml — full prompt × provider matrix·yaml

description: 'Translation eval — three prompts, two providers'

prompts:
  - "Translate '{{text}}' to {{language}}."
  - "You are a translator. Convert '{{text}}' into {{language}}."
  - file://prompts/system.txt

providers:
  - openai:gpt-4o-mini
  - anthropic:messages:claude-sonnet-4-5

tests:
  - vars:
      text: 'Hello world'
      language: 'French'
    assert:
      - type: contains
        value: 'Bonjour'
      - type: llm-rubric
        value: 'Translation is natural and fluent'
  - vars:
      text: 'Good morning'
      language: 'German'
    assert:
      - type: similar
        value: 'Guten Morgen'
        threshold: 0.6
      - type: javascript
        value: 'output.length < 200'

Custom Python assertion·yaml

tests:
  - vars: {input: 'extract emails from this text'}
    assert:
      - type: python
        value: |
          import re
          emails = re.findall(r'[\w.+-]+@[\w-]+\.[\w.-]+', output)
          return len(emails) > 0

promptfoo: CLI-First Eval Runner

커맨드라인에서 가장 빠르게 eval 돌리는 법

기본 제공되는 것

빛나는 곳

안 맞는 곳

Code

External links

Exercise

Progress

댓글 0