변환 chain — map, batch, shuffle, filter

~13 min · map, batch, shuffle, filter

Level 0Level 0

0 XP0/78 lessons0/17 achievements

0/100 XP to next level100 XP to go0% complete

권장 Pipeline 순서

tf.data의 힘은 변환 chain에서 와. 각각이 새 Dataset 반환 (lazy, 아직 실행 안 됨). 순서가 중요해 — 잘못하면 randomization이 조용히 망가지거나 메모리 낭비.

단계	변환	이 순서인 이유
1	`.map(parse, num_parallel_calls=AUTOTUNE)`	raw 데이터 디코드/파싱
2	`.cache()`	비싼 파싱 후, randomization 전에 캐시
3	`.shuffle(buffer)`	매 epoch 재셔플, batch 전에
4	`.batch(size, drop_remainder=True)`	batch로 묶기
5	`.map(augment, num_parallel_calls=AUTOTUNE)`	batch 후 augment (batched op이 더 빠름)
6	`.prefetch(AUTOTUNE)`	항상 마지막 — CPU/GPU 오버랩

핵심 규칙: shuffle은 batch 전에. batch 다음에 shuffle하면 batch만 랜덤되고 batch 안 element들은 뭉친 상태로 남아.

Code

Standard training pipeline·python

import tensorflow as tf

AUTOTUNE = tf.data.AUTOTUNE
BATCH_SIZE = 128

def normalize(image, label):
    return tf.cast(image, tf.float32) / 255.0, label

def augment(image, label):
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_brightness(image, max_delta=0.1)
    return image, label

train_ds = (
    tf.data.Dataset.from_tensor_slices((x_train, y_train))
    .map(normalize, num_parallel_calls=AUTOTUNE)   # 1. parse
    .cache()                                         # 2. cache normalized
    .shuffle(buffer_size=10000, seed=42)            # 3. shuffle
    .batch(BATCH_SIZE, drop_remainder=True)         # 4. batch
    .map(augment, num_parallel_calls=AUTOTUNE)      # 5. augment batched
    .prefetch(AUTOTUNE)                              # 6. prefetch — last
)

# Validation: no shuffle, no augment
val_ds = (
    tf.data.Dataset.from_tensor_slices((x_val, y_val))
    .map(normalize, num_parallel_calls=AUTOTUNE)
    .batch(BATCH_SIZE)
    .prefetch(AUTOTUNE)
)

Exercise

MNIST 로드, 픽셀을 [0,1]로 normalize, buffer 10000으로 shuffle, batch 128, prefetch AUTOTUNE인 tf.data pipeline 만들어. 한 batch iterate하고 x shape (128, 28, 28), y shape (128,) 확인.

Progress

Progress is local-only — sign in to sync across devices.

← PreviousDataset 기초 — 만들기와 반복 Next →AUTOTUNE, Caching, Prefetching

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.