LoRA 변형 & 비교

2024~2025 LoRA 가족

원래 LoRA 논문 이후 여러 변형 발표. 각각이 특정 약점 보완.

변형	아이디어	이득
DoRA	가중치를 magnitude + direction으로 분해, LoRA는 direction에만.	Full FT랑 갭 좁힘. `use_dora=True`.
LoRA+	A랑 B 행렬에 다른 learning rate.	~2배 빠른 수렴.
rsLoRA	Alpha를 1/r 대신 1/√r로 스케일.	높은 rank(r ≥ 64)에서 더 좋음. `use_rslora=True`.
AdaLoRA	SVD로 layer별 rank 동적 할당.	Layer별 최적 rank 자동 발견.
VeRA	공유 random A,B + 학습 가능 scaling 벡터.	LoRA보다 ~10배 적은 파라미터.
LoRA-FA	A는 freeze, B만 학습.	Activation 메모리 절반.

Full 파인튜닝 대비 메서드 비교

메서드	학습 가능 %	Full FT 대비 품질	메모리
Full 파인튜닝	100%	베이스라인	매우 높음
LoRA	~0.1~1%	~95~98%	낮음
QLoRA	~0.1~1%	~93~97%	매우 낮음
DoRA	~0.1~1%	~97~99%	낮음~중간
Prompt tuning	~0.01%	~85~90%	매우 낮음
Prefix tuning	~0.1%	~88~93%	매우 낮음

Code

DoRA + rsLoRA configs in PEFT v0.17+·python

from peft import LoraConfig

# DoRA — closes the gap with full FT
dora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules="all-linear",
    use_dora=True,                # enable DoRA
    task_type="CAUSAL_LM",
)

# rsLoRA — better at high ranks
rslora_config = LoraConfig(
    r=64,
    lora_alpha=64,
    target_modules="all-linear",
    use_rslora=True,              # better scaling at high ranks
    task_type="CAUSAL_LM",
)

# Combined: DoRA + rsLoRA at high rank
strongest_config = LoraConfig(
    r=64,
    lora_alpha=64,
    target_modules="all-linear",
    use_dora=True,
    use_rslora=True,
    lora_dropout=0.1,
    task_type="CAUSAL_LM",
)

Exercise

같은 모델 + 데이터셋에 작은(200예제) LoRA 잡 셋: vanilla LoRA, DoRA, 높은 rank DoRA+rsLoRA. Validation loss 곡선 비교. 어느 게 진짜로 더 좋아? 기대보다 갭 작은 경우 많음 — 가장 화려한 변형 기본값 잡기 전에 네 작업에서 lift 검증하는 게 포인트.

2024~2025 LoRA 가족

Full 파인튜닝 대비 메서드 비교

Code

External links

Exercise

Progress

댓글 0