Llama 계보 — open weight 프론티어

Meta의 Llama 시리즈가 거대 언어 모델 연구를 대중화. 각 release가 'open weight'가 의미할 수 있는 것을 밀고 나갔어.

릴리스	날짜	변종	컨텍스트	주목할 점
Llama 1	2023년 2월	7B / 13B / 33B / 65B	2K	첫 경쟁력 있는 open-weight LLM (연구용 라이선스)
Llama 2	2023년 7월	7B / 13B / 70B	4K	상업 라이선스, chat fine-tune
Llama 3	2024년 4월	8B / 70B	8K → 128K	128K vocab, GQA, SwiGLU, RoPE — 모던 아키텍처 템플릿
Llama 3.1	2024년 7월	8B / 70B / 405B	128K	405B 플래그십이 GPT-4급 독점 모델 매치
Llama 3.2	2024년 9월	1B / 3B / 11B 비전 / 90B 비전	128K	더 작은 텍스트 모델 + 첫 비전 모델
Llama 3.3	2024년 12월	70B	128K	정교화된 post-training, 39.3M H100 GPU-hour
Llama 4 Scout	2025년 4월	109B / 17B active	10M	MoE, iRoPE, multimodal native
Llama 4 Maverick	2025년 4월	400B / 17B active	1M	expert 128개 + shared expert 1개
Llama 4 Behemoth	(발표됨)	~2T 총 / 288B active	—	프론티어급 teacher 모델 (2025년 기준 학습 중)

Llama 3.3 70B 아키텍처: 80 layer, d_model=8192, 64 Q head + 8 KV head (GQA 그룹 크기 8), SwiGLU 활성, RMSNorm, RoPE 위치 인코딩, 128K 컨텍스트 윈도우. 모든 모던 open-weight 팀이 출발점으로 삼는 템플릿.

Code

Loading and inspecting Llama 3.3 70B·python

from transformers import AutoConfig
cfg = AutoConfig.from_pretrained("meta-llama/Llama-3.3-70B-Instruct")
for k, v in vars(cfg).items():
    if not k.startswith('_'):
        print(f"  {k} = {v}")
# d_model = 8192, num_hidden_layers = 80,
# num_attention_heads = 64, num_key_value_heads = 8,
# intermediate_size ≈ 28672, vocab_size = 128256, ...

Llama 계보 — open weight 프론티어

Code

External links

Exercise

Progress

댓글 0