Storage Format 과 Maintenance

JSONL 이 eval lingua franca

쓸 가치 있는 모든 eval framework 가 JSONL 소비 — 줄바꿈 구분 JSON, 줄당 한 record. grep 가능, diff 친화적, streamable, schema overhead 0. 써.

Schema 규율

JSONL 이 schemaless 여도 record 는 그러면 안 돼. schema pin 하고, 쓸 때 검증, 변경 시 version.

필수: id, input, tags.
선택: reference, acceptable_alternatives, expected, metadata, source.
Auditing: created_at, created_by, version.

어디 저장

어디	언제
Repo (datasets/*.jsonl)	총 < 5MB. PR 로 review. 가장 쉬움.
S3 / R2 / GCS	더 큼. object lifecycle 로 versioned.
Hugging Face Hub	공유하고 싶은 public dataset.
Braintrust / DeepEval cloud	UI annotation + versioning + eval run 한 곳에.
Argilla / Label Studio	active annotation 의 self-hosted UI.

원칙: diff 가독성을 만들어주는 dataset 형식이 실제로 maintain 할 dataset 형식이야. 직접 만지는 eval dataset 에서 JSONL 이 Parquet 이겨.

Maintenance 의례

분기별 review — random case 50개 sample. 여전히 production 대표해?

Stale flag — production sampling 이 dataset 에 없는 emerging input cluster 드러내면 같은 주에 추가. 느린 refresh 가 eval suite 죽여.

Retirement — 동작이 더 이상 관련 없는 case 는 삭제 대신 archived: true flag. 역사적 trail 원해.

Code

JSONL schema validation·python

from pydantic import BaseModel, Field
from typing import Optional
import json

class EvalCase(BaseModel):
    id: str
    input: str
    reference: Optional[str] = None
    acceptable_alternatives: list[str] = Field(default_factory=list)
    tags: list[str] = Field(default_factory=list)
    source: str = "unknown"
    created_at: Optional[str] = None
    version: int = 1

def validate_jsonl(path):
    bad = []
    with open(path) as f:
        for i, line in enumerate(f, 1):
            try:
                EvalCase.model_validate_json(line)
            except Exception as e:
                bad.append((i, str(e)))
    return bad

errors = validate_jsonl("datasets/regression.jsonl")
if errors:
    raise SystemExit(f"{len(errors)} bad records: {errors[:3]}")

Pre-commit hook — 검증 실패 dataset PR 차단·yaml

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: validate-eval-datasets
        name: Validate eval JSONL datasets
        entry: python scripts/validate_datasets.py
        language: system
        files: 'datasets/.*\.jsonl$'
        pass_filenames: true

Storage Format 과 Maintenance

JSONL 이 eval lingua franca

Schema 규율

어디 저장

Maintenance 의례

Code

External links

Exercise

Progress

댓글 0