C.W.K.
Stream
Lesson 06 of 07 · published

JSONL Logging & Replay-based Testing

~22 min · jsonl, logging, replay-testing

Level 0Tokenizer
0 XP0/54 lessons0/10 achievements
0/120 XP to next level120 XP to go0% complete

가장 흔한 LLM testing 패턴은 모델 mock — mock_openai.return_value = "fake response". 테스트 영원히 통과. Production 은 망가짐 — SDK 가 parameter 바꿈, prompt drift, tool schema 가 required field 추가. Mock 은 fictional model 을 fictional behavior 에 pin, reality 가 아래에서 움직임.

Replay-based 가 답

실제 session 을 JSONL 로 capture. CI 에서 session event 를 코드에 replay (모델 출력은 JSONL 에서 mock, fixture 아님). Observable outcome 에 assert — final text, tool sequence, total cost. SDK 또는 prompt 변하면 정확한 line 에서 fail.

Mock vs replay 의 구체적 차이

  • Mock: 'When called with X, return Y' (fictional)
  • Replay: 'Real call 에서 정확히 이 events 받았어 — 코드가 같은 final state 도달하는지?'

cwkPippa JSONL 은 encrypted at rest

Per-conversation JSONL 에 line-level Fernet encryption, office Mac Keychain 에 passphrase. Peer Mac 들은 opaque blob 으로 fleet rsync round-trip, plaintext 절대 X. 2026-04-28 ship.

Code

Append-only JSONL writer with line-level encryption·python
import json, time
from pathlib import Path
from datetime import datetime, timezone

class JSONLLogger:
    """Append-only JSONL logger for agent debugging."""
    def __init__(self, log_path):
        self.log_path = Path(log_path)
        self.log_path.parent.mkdir(parents=True, exist_ok=True)
        self._file = open(self.log_path, "a", buffering=1)  # line-buffered

    def log(self, event_type, **fields):
        record = {"ts": datetime.now(timezone.utc).isoformat(),
                  "event": event_type, **fields}
        self._file.write(json.dumps(record) + "\\n")

    def log_request(self, session_id, model, messages):
        start = time.monotonic()
        self.log("request", session_id=session_id, model=model,
                 message_count=len(messages))
        return start

    def log_response(self, session_id, start_time, finish_reason, usage, cost_usd):
        self.log("response", session_id=session_id,
                 latency_ms=round((time.monotonic() - start_time) * 1000),
                 finish_reason=finish_reason, cost_usd=round(cost_usd, 6))
Replay-based pytest fixture·python
class MockTransport(httpx.AsyncBaseTransport):
    """Return predetermined responses for deterministic testing."""
    def __init__(self, responses: dict):
        self._responses = responses  # url_path → response_body

    async def handle_async_request(self, request):
        body = self._responses.get(request.url.path, {"error": "Not found"})
        return httpx.Response(200, json=body, headers={"content-type": "application/json"})

# Use in tests:
mock = MockTransport({"/v1/chat/completions": {
    "choices": [{"message": {"content": "Test"}, "finish_reason": "stop"}],
    "usage": {"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15},
}})
test_client = httpx.AsyncClient(base_url="https://api.openai.com", transport=mock)

External links

Exercise

실제 session 1 개 JSONL 로 capture. Replay 하는 pytest test build — 모델은 scripted event mock, final assistant text 검증. Prompt 1 개 바꿔서 fail 만들기 — failure 가 정확한 line 가리키는지 verify.

Progress

Progress is local-only — sign in to sync across devices.
이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

댓글 0

🔔 답글 알림 (로그인 필요)
로그인댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.