Reference-Based vs Reference-Free

Output 을 뭐와 비교?

Reference-based 평가는 judge 한테 known-good 답 (또는 acceptable 답 list) 제공. Judge 의 일은 비교. Reference-free 평가는 judge 한테 question 과 output 만 줘. Judge 가 absolute term 으로 judge 해야, 자주 written rubric 으로.

각각 언제 써

Reference-based	Reference-free
번역, classification, ground truth 있는 RAG	Open-ended generation, summarization, dialogue
Golden dataset 만들 수 있음	많은 valid output 존재; 한 golden 답 만드는 게 불가능
더 tight, 더 재현 가능 scoring	더 loose, 더 rubric-dependent
자주 deterministic grader (BLEU, BERTScore) 와 결합 가능	거의 항상 LLM judge 필요

Hybrid: 여러 reference 의 reference-based

많은 phrasing 이 옳은 task ("이 아이디어 어떻게 표현할지") 에 대해 acceptable reference list 제공하고 judge 가 output 이 어떤 reference 와도 functionally equivalent 하면 accept. 엄격한 reference-based 와 fully reference-free 사이.

원칙: Reference-based 는 tighter, reference-free 는 broader. Task 에 알 수 있는 correct 답 또는 알 수 있는 correct rubric 있는지에 따라 선택.

Rubric 이 reference 의 stand-in

Reference-free 평가에서 judge 는 rubric 외에 anchor 할 게 없어. Tight rubric 작성 — 정확한 criteria, edge case, example — 의 규율이 reference-based mode 보다 더 중요. Reference-free mode 의 vague rubric 은 coin-flip judge 만들어.

Code

Reference-based judge·python

REF_PROMPT = """
Question: {question}
Reference answer: {reference}
Candidate answer: {candidate}

Is the candidate factually equivalent to the reference?
- PASS if it conveys the same key facts (paraphrasing is fine)
- FAIL if it changes, omits, or adds facts

Reply: {{\"reasoning\": \"...\", \"verdict\": \"PASS\"|\"FAIL\"}}
"""

엄격한 rubric 있는 reference-free judge·python

FREE_PROMPT = """
You are evaluating a customer-support response.

## Rubric (all must hold for PASS)
- Addresses the customer's specific question
- Is factually correct given general knowledge of our product (laptops, 1-year warranty)
- Maintains a professional, empathetic tone
- Provides at least one concrete next step the customer can take
- Does NOT make up policies that don't exist (e.g., 'we don't offer 5-year warranties — saying yes is a fail')

Question: {question}
Response: {response}

Reply: {{\"reasoning\": \"...\", \"verdict\": \"PASS\"|\"FAIL\", \"failed_criteria\": [\"...\"]}}
"""

Reference-Based vs Reference-Free

Output 을 뭐와 비교?

각각 언제 써

Hybrid: 여러 reference 의 reference-based

Rubric 이 reference 의 stand-in

Code

External links

Exercise

Progress

댓글 0