NumPy 배열 vs Python 리스트 — ndarray 가 왜?

모든 것의 토대

NumPy (Numerical Python) 는 Python 의 거의 모든 수치 연산의 토대야. Pandas 가 그 위에 만들어졌고, PyTorch, TensorFlow 가 이걸 말하고, scikit-learn 이 요구해. PyArrow 의 대부분이 호환 배열을 건네줘. ndarray 를 이해하면 modern Python data stack 의 엔진룸을 이해한 거야.

현 stable: NumPy 2.4.4 (2026.3). NumPy 2.0 (2024.6) 이 legacy API 정리한 breaking-change 릴리스였고, 2.0+ 부터 modern 세상.

왜 빨라?

Python 리스트는 포인터 벡터야. 각 요소가 자기 type tag, reference count, allocation 가진 완전한 Python 객체. 백만 요소 리스트 순회하면 백만 번 포인터 dereference + 백만 번 Python 메서드 dispatch.

NumPy ndarray 는 raw 한 typed 값을 담는 single block 의 contiguous memory — 요소별 포인터 없음, 요소별 type tag 없음. 순회는 Python overhead 없는 tight C loop. 리스트에선 초가 걸리는 같은 연산이 배열에선 밀리초.

실제로 뭘 받아?

빠른 n차원 배열 — homogeneous typed 데이터, contiguous memory.
Vectorized 연산 — 요소별 math 가 Python loop 없이 C 속도.
Broadcasting — 다른 shape 의 배열들이 자동으로 align.
Universal functions (ufuncs) — np.sin, np.exp, np.where 등, 다 element-wise + 병렬화 가능.
Linear algebra, FFT, random — 수치 toolkit 이 박스에 들어 있어.
Modern Generator API — np.random.default_rng() 가 legacy global state 대체.

Code

리스트 vs ndarray — 속도 차이는 미묘하지 않아·python

import numpy as np, time, math

n = 5_000_000
py_list = list(range(n))
np_arr  = np.arange(n)

# Pure Python — 요소별 sqrt
t = time.perf_counter()
py_result = [math.sqrt(x) for x in py_list]
print(f'list comprehension: {time.perf_counter() - t:.3f}s')

# NumPy — 요소별 sqrt, vectorized
t = time.perf_counter()
np_result = np.sqrt(np_arr)
print(f'np.sqrt:           {time.perf_counter() - t:.3f}s')

# 최근 노트북에서 ~50–100x. 배열 클수록 격차 더 벌어짐.

Modern random Generator API — reproducibility 내장·python

import numpy as np

rng = np.random.default_rng(seed=42)        # modern 진입점
data = rng.normal(loc=100, scale=15, size=1_000_000)

# Vectorized 통계, loop 없음
print(f'mean: {data.mean():.2f}')           # ~100.0
print(f'std:  {data.std():.2f}')            # ~15.0

# Boolean masking — 역시 vectorized
above_120 = data[data > 120]
print(f'fraction > 120: {len(above_120) / len(data):.3%}')

# Old style (legacy global state) — 작동은 하지만 새 코드에선 쓰지 마
# np.random.seed(42); np.random.normal(...)

NumPy 배열 vs Python 리스트 — ndarray 가 왜?

모든 것의 토대

왜 빨라?

실제로 뭘 받아?

Code

External links

Exercise

Progress

댓글 0