Generation 설정: Sampling, Beam, Constraints

generate() 는 함수가 아니라 state machine

밑단에서 model.generate() 는 GenerationConfig 객체로 구동되는 token-by-token 디코딩 루프. config 가 strategy (do_sample, num_beams), 한계 (max_new_tokens, min_new_tokens), 샘플링 분포 shaping (temperature, top_p, top_k, repetition_penalty), 정지 조건 (eos_token_id, stop_strings) 운반.

실제로 쓸 strategy 셋

Greedy (do_sample=False, num_beams=1): 매 step argmax. Deterministic. 툴 콜링, JSON, 코드 completion 의 디폴트.
Sampling (do_sample=True): nucleus + top-k + temperature. 챗 디폴트. 시작점 temperature=0.7, top_p=0.9.
Beam search (num_beams=4): 후보 sequence 여러 개 explore. translation / summarization 처럼 “단일 best” 답 있을 때 유용. 비용 큼 — num_beams 배수의 일.

정지 조건은 옵션 아님

eos_token_id 안 셋 한 게 모델이 “무한 루프” 도는 가장 흔한 이유. 모던 instruct 모델 종종 EOS 토큰 여러 개 (<|eot_id|>, <|end_of_turn|> 등) — 리스트로 넘겨. stop_strings 는 임의 substring 을 decode 후 매치 (느린데 유연함).

Code

Greedy vs sampling vs beam·python

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo = "Qwen/Qwen2.5-1.5B-Instruct"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=torch.bfloat16, device_map="auto")

prompt = tok.apply_chat_template(
    [{"role": "user", "content": "Translate to Korean: The Hub is the registry."}],
    tokenize=False, add_generation_prompt=True,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)

# Greedy — deterministic
out_g = model.generate(**inputs, max_new_tokens=50, do_sample=False)

# Sampling — varied
out_s = model.generate(**inputs, max_new_tokens=50,
                       do_sample=True, temperature=0.7, top_p=0.9)

# Beam — exhaustive
out_b = model.generate(**inputs, max_new_tokens=50, num_beams=4, early_stopping=True)

for label, out in [("greedy", out_g), ("sample", out_s), ("beam", out_b)]:
    print(label, ":", tok.decode(out[0], skip_special_tokens=True)[-200:])

여러 EOS 토큰에 멈추기 (Llama-3 스타일)·python

from transformers import AutoTokenizer

tok = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

# Llama-3 는 eos_token + eot (end-of-turn) marker 둘 다 가짐
eos_ids = [tok.eos_token_id, tok.convert_tokens_to_ids("<|eot_id|>")]
print("stop ids:", eos_ids)

# generate() 에 넘기기: eos_token_id=eos_ids

Generation 설정: Sampling, Beam, Constraints

generate() 는 함수가 아니라 state machine

실제로 쓸 strategy 셋

정지 조건은 옵션 아님

Code

External links

Exercise

Progress

댓글 0