Hugging Face Trainer 로 Fine-tuning

Trainer — loop 짜기 싫을 때

HuggingFace Trainer 가 표준 PyTorch training loop 를 batteries 로 wrap: distributed training, mixed precision, gradient accumulation, evaluation, checkpointing, logging, battle-tested default set. 대부분 fine-tuning task 에 너 loop 짜기보다 Trainer 사용이 빠름.

세 조각

model — 보통 AutoModelFor[Task].from_pretrained(...).
dataset — 보통 HuggingFace Dataset, .map(tokenizer, batched=True) 통해 tokenize.
training argument — TrainingArguments 객체가 모든 거 encode: epoch, batch size, LR, eval strategy, save strategy, FP16/BF16, logging dir.

raw PyTorch 와 비교 잃는 거

Trainer 가 opinionated. training loop 가 unusual structure (두 loss 교대, custom gradient surgery, multi-stage curricula) 면 싸움. 'fine-tune model X on dataset Y' 엔 옳은 도구. '이 novel architecture 를 이 novel objective 로 train' 은 너 loop 짜기.

Trainer vs Lightning vs raw PyTorch 언제

HF Trainer — HF model 을 HF dataset 에 fine-tune 하는 데 best. 통합이 빡빡.
PyTorch Lightning — loop boilerplate 처리 원하지만 model 이 HF 아님이거나, Trainer 보다 더 유연성 필요.
raw PyTorch — novel architecture, research, 또는 max 통제. Track 4 에 짠 loop 가 production 에 충분.

Code

DistilBERT IMDB fine-tune — 최소 예·python

from datasets import load_dataset
from transformers import (
    AutoTokenizer, AutoModelForSequenceClassification,
    Trainer, TrainingArguments,
)

model_name = "distilbert-base-uncased"
ds = load_dataset("imdb")
tok = AutoTokenizer.from_pretrained(model_name)

def tokenize(batch):
    return tok(batch["text"], padding="max_length", truncation=True, max_length=512)

ds_tok = ds.map(tokenize, batched=True)

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

args = TrainingArguments(
    output_dir="./out",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    learning_rate=2e-5,
    warmup_steps=500,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    bf16=True,                                 # bf16 mixed precision
    logging_dir="./logs",
    logging_steps=100,
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=ds_tok["train"],
    eval_dataset=ds_tok["test"],
    tokenizer=tok,                             # for proper save/load
)
trainer.train()

compute_metrics 의 custom metric·python

import numpy as np
from transformers import Trainer
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    p, r, f1, _ = precision_recall_fscore_support(labels, preds, average='macro')
    return {
        'accuracy': accuracy_score(labels, preds),
        'precision': p,
        'recall': r,
        'f1': f1,
    }

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=ds_tok['train'],
    eval_dataset=ds_tok['test'],
    tokenizer=tok,
    compute_metrics=compute_metrics,
)

# Now eval prints these metrics each epoch
trainer.evaluate()

checkpoint 에서 training resume·python

from transformers import Trainer

# Trainer automatically saves checkpoints to output_dir/checkpoint-N/
# Resume from a specific one:
trainer.train(resume_from_checkpoint="./out/checkpoint-1500")

# Or resume from the latest:
trainer.train(resume_from_checkpoint=True)

# Push the trained model to the HF Hub (requires `huggingface-cli login`)
# trainer.push_to_hub("my-fine-tuned-distilbert")

Exercise

작은 NLP dataset 고르기 (rotten_tomatoes 가 ~10K sample, 빠름). distilbert-base-uncased 를 Trainer 로 1 epoch fine-tune. compute_metrics 로 accuracy 추가. training 끝에 eval 숫자 print 검증. final model 저장하고 AutoModelForSequenceClassification.from_pretrained('./out') 로 로드.