JSONL Logging & Replay-based Testing

가장 흔한 LLM testing 패턴은 모델 mock — mock_openai.return_value = "fake response". 테스트 영원히 통과. Production 은 망가짐 — SDK 가 parameter 바꿈, prompt drift, tool schema 가 required field 추가. Mock 은 fictional model 을 fictional behavior 에 pin, reality 가 아래에서 움직임.

Replay-based 가 답

실제 session 을 JSONL 로 capture. CI 에서 session event 를 코드에 replay (모델 출력은 JSONL 에서 mock, fixture 아님). Observable outcome 에 assert — final text, tool sequence, total cost. SDK 또는 prompt 변하면 정확한 line 에서 fail.

Mock vs replay 의 구체적 차이

Mock: 'When called with X, return Y' (fictional)
Replay: 'Real call 에서 정확히 이 events 받았어 — 코드가 같은 final state 도달하는지?'

cwkPippa JSONL 은 encrypted at rest

Per-conversation JSONL 에 line-level Fernet encryption, office Mac Keychain 에 passphrase. Peer Mac 들은 opaque blob 으로 fleet rsync round-trip, plaintext 절대 X. 2026-04-28 ship.

Code

Append-only JSONL writer with line-level encryption·python

import json, time
from pathlib import Path
from datetime import datetime, timezone

class JSONLLogger:
    """Append-only JSONL logger for agent debugging."""
    def __init__(self, log_path):
        self.log_path = Path(log_path)
        self.log_path.parent.mkdir(parents=True, exist_ok=True)
        self._file = open(self.log_path, "a", buffering=1)  # line-buffered

    def log(self, event_type, **fields):
        record = {"ts": datetime.now(timezone.utc).isoformat(),
                  "event": event_type, **fields}
        self._file.write(json.dumps(record) + "\\n")

    def log_request(self, session_id, model, messages):
        start = time.monotonic()
        self.log("request", session_id=session_id, model=model,
                 message_count=len(messages))
        return start

    def log_response(self, session_id, start_time, finish_reason, usage, cost_usd):
        self.log("response", session_id=session_id,
                 latency_ms=round((time.monotonic() - start_time) * 1000),
                 finish_reason=finish_reason, cost_usd=round(cost_usd, 6))

Replay-based pytest fixture·python

class MockTransport(httpx.AsyncBaseTransport):
    """Return predetermined responses for deterministic testing."""
    def __init__(self, responses: dict):
        self._responses = responses  # url_path → response_body

    async def handle_async_request(self, request):
        body = self._responses.get(request.url.path, {"error": "Not found"})
        return httpx.Response(200, json=body, headers={"content-type": "application/json"})

# Use in tests:
mock = MockTransport({"/v1/chat/completions": {
    "choices": [{"message": {"content": "Test"}, "finish_reason": "stop"}],
    "usage": {"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15},
}})
test_client = httpx.AsyncClient(base_url="https://api.openai.com", transport=mock)

JSONL Logging & Replay-based Testing

Replay-based 가 답

Mock vs replay 의 구체적 차이

cwkPippa JSONL 은 encrypted at rest

Code

External links

Exercise

Progress

댓글 0