Dataset 기초 — 만들기와 반복

~11 min · dataset, from-tensor-slices, iteration

Level 0Level 0

0 XP0/78 lessons0/17 achievements

0/100 XP to next level100 XP to go0% complete

Pipeline이 시작되는 곳

tf.data.Dataset은 순서 있는 element 모음. 각 element는 단일 tensor, tensor tuple (보통: feature와 label), 아니면 tensor dict (이름 있는 feature). 메모리 데이터, 파일, generator에서 만들 수 있어.

Dataset은 lazy — 만들기만 한다고 데이터 로드 안 돼. Pipeline은 iteration 시작할 때만 (model.fit이나 수동 loop) 돌아. .take(n)은 첫 n개 element만 가진 새 Dataset 만들어 — 디버깅에 정말 유용.

Code

Dataset creation — 4 sources·python

import tensorflow as tf
import numpy as np

# From a Python list
ds_list = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5])

# From NumPy arrays — (features, labels) pairs
x_np = np.random.randn(1000, 32).astype(np.float32)
y_np = np.random.randint(0, 10, size=(1000,))
ds_numpy = tf.data.Dataset.from_tensor_slices((x_np, y_np))

# From a dict — named features
ds_dict = tf.data.Dataset.from_tensor_slices({
    'image':  np.random.randn(100, 28, 28, 1).astype(np.float32),
    'label':  np.random.randint(0, 10, size=(100,)),
    'weight': np.ones(100, dtype=np.float32),
})

# From a list of files (lazy — doesn't read files yet)
ds_files = tf.data.Dataset.list_files("data/images/*.jpg")

Iterate / inspect / take·python

for element in ds_list.take(3):
    print(element.numpy())   # 1, 2, 3

for x, y in ds_numpy.take(2):
    print(f"x shape: {x.shape}, y: {y.numpy()}")

# Convert small dataset to numpy for inspection
first_batch = list(ds_list.take(5).as_numpy_iterator())
print(first_batch)   # [1, 2, 3, 4, 5]

Progress

Progress is local-only — sign in to sync across devices.

← Previoustf.data가 존재하는 이유 Next →변환 chain — map, batch, shuffle, filter

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.