Trainer 패턴
transformers.Trainer 가 canonical 학습 루프. 조립: 모델, 토크나이저, 토크나이즈된 Dataset, optional metric, optional callback, TrainingArguments 객체. trainer.train() 실행; trainer.evaluate() 보고; trainer.save_model() 저장. 같은 루프가 fine-tuning, LoRA 학습, 풀 pre-training (적절한 args 와) 핸들.
TrainingArguments 노브 95% 셋
output_dir— 체크포인트 + tensorboard 로그.num_train_epochs또는max_steps— budget.per_device_train_batch_size+gradient_accumulation_steps— effective batch.learning_rate,weight_decay,warmup_ratio,lr_scheduler_type— optimization.bf16/fp16— mixed precision.eval_strategy,eval_steps,save_strategy,save_steps— cadence.logging_steps,report_to=['tensorboard', 'wandb']— observability.
Trainer 가 숨기는 거
분산 셋업 (single-node multi-GPU 가 accelerate 통해 “just works”), gradient accumulation, mixed precision, gradient clipping, lr scheduling, 체크포인트 resumption. 비용: opinionated. 이질적 optimizer / non-standard loss 면 raw PyTorch 루프로 내려가 — 또는 trl / peft Trainer subclass 사용.