AUTOTUNE, Caching, Prefetching

성능의 세 마법 단어

Naive pipeline을 빠른 pipeline으로 바꾸는 세 메커니즘 — AUTOTUNE, caching, prefetching. 기본으로 셋 다 써.

AUTOTUNE (tf.data.AUTOTUNE): num_parallel_calls=4 같이 hardcoded 대신 TF가 runtime에 throughput 측정해서 parallelism 동적 튜닝. 거의 항상 fixed 값보다 나음.

Caching (.cache()): pipeline 출력을 그 지점에서 저장. 첫 epoch는 전처리 돌리고 결과 저장; 다음 epoch부터는 cache에서 읽기. 메모리에 들어가면 .cache(), 큰 dataset은 .cache("/tmp/path")로 디스크 캐시.

Prefetching (.prefetch(AUTOTUNE)): CPU 데이터 준비랑 GPU model 계산을 오버랩. 항상 pipeline의 마지막. 없던 데 추가하면 코드 복잡도 0인데 20–50% 속도 향상 — 공짜 성능이야.

Augment 전에 cache, 후가 아니라. Cache는 캐시된 출력을 영원히 들고 있어. Augmentation 후에 cache하면 매 epoch 같은 augmentation 캐시하는 거라 augmentation 자체의 의미가 사라져. 결정적 전처리만 cache, augment는 그 후에.

Code

Cache + augment correct order·python

import tensorflow as tf

AUTOTUNE = tf.data.AUTOTUNE

# CORRECT: cache deterministic preprocessing, augment after cache
correct = (
    tf.data.Dataset.from_tensor_slices((x, y))
    .map(normalize, num_parallel_calls=AUTOTUNE)   # cheap, deterministic
    .cache()                                         # cache normalized
    .shuffle(10000)
    .batch(128)
    .map(augment, num_parallel_calls=AUTOTUNE)      # fresh per epoch
    .prefetch(AUTOTUNE)
)

# WRONG: caching augmented data — same augmentation every epoch
wrong = (
    tf.data.Dataset.from_tensor_slices((x, y))
    .map(augment)
    .cache()                                         # locks in augmentation
    .batch(128)
)

AUTOTUNE, Caching, Prefetching

성능의 세 마법 단어

Code

Progress

댓글 0