Reproducibility 와 Seed

왜 정확한 reproducibility 가 어려운가

Modern GPU 코드는 속도 위해 non-deterministic kernel 사용: parallel reduction 이 operation 재정렬 가능, atomic add 가 bitwise reproducible 아님, cuDNN 이 runtime 에 가장 빠른 convolution algorithm 선택. Seed set 해도 두 training run 이 몇 epoch 후 셋째 자리에서 diverge 가능.

보통 원하는 건 statistical reproducibility — 같은 final accuracy ± 0.1%, 같은 loss curve 모양 — bitwise 가 아냐. Seed, deterministic data ordering, 몇 cuDNN flag 로 달성 가능.

원칙: Seed set 할 것: Python random, numpy, PyTorch CPU, PyTorch CUDA, DataLoader worker_init. 그 다음 GPU non-determinism 이 여전히 셋째 자리 비용 받는다는 거 받아들여.

Seed-set 의식

Training 맨 위, model 만들기 전에 set_seed(42) 호출. 비교하고 싶은 run 사이 같은 seed. Ablation 에 한 knob 만 변경, seed 유지.

실제 bitwise determinism 필요할 때

Unit test, regression test, 일부 regulated pipeline 에는 torch.use_deterministic_algorithms(True) set 하고 slowdown (자주 2x) 받아들여. matmul determinism 위해 CUBLAS_WORKSPACE_CONFIG=:4096:8 set. Full env (Python version, package version, GPU driver) 저장 — 작은 차이가 모든 flag set 해도 determinism 깸.

팁: Reproducibility 는 binary 아냐. 대부분 statistical reproducibility (seed + 같은 data + 같은 코드) 가 실제 필요한 거. Bitwise determinism 은 regulated-industry feature, default 아냐.

Code

The standard seed-set function·python

import random, os, numpy as np, torch

def set_seed(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)

def worker_init_fn(worker_id):
    seed = torch.initial_seed() % 2**32
    np.random.seed(seed)
    random.seed(seed)

set_seed(42)
loader = DataLoader(ds, batch_size=32, num_workers=8,
                    worker_init_fn=worker_init_fn,
                    generator=torch.Generator().manual_seed(42))

왜 정확한 reproducibility 가 어려운가

Seed-set 의식

실제 bitwise determinism 필요할 때

Code

External links

Exercise

Progress

댓글 0