그대로 통하는 NumPy 반사 — 그리고 갈라지는 두 곳

그냥 통하는 90%

NumPy 근육 기억 있으면, 대부분이 네가 알아채지도 못하게 MLX 로 transfer. 정수로 indexing, 콜론으로 slicing, 다른 shape 인 array 둘을 arithmetic op 에서 broadcasting, reshape, reduction (sum, mean, argmax) — 다 네가 기대하는 대로 동작. NumPy 에서 칠 거 그대로 쳐서 놀라울 정도로 많은 MLX 코드 박을 수 있어.

이 레슨은 carry-over 를 카탈로그해서 네가 망설이지 않게 하고, 안 보고 가면 너를 걸려 넘어뜨릴 두 divergence 를 이름 붙여.

뭐가 carry over (코드 돌려, 모든 거 알아봐)

Indexing 은 sub-array 돌려줘. Slicing 은 sub-array 돌려줘. Boolean masking 은 sub-array 돌려줘. Broadcasting 동작. Reduction 동작, axis 있든 없든. Reshape 동작. Item assignment 통한 in-place mutation 동작 (이건 나도 놀랐어 — 옛 MLX 문서가 다르게 말했지만, 0.31.x 기준 a[i, j] = value 괜찮아).

외울 가치 있는 두 divergence

1. 기본으로 lazy. NumPy 에서 y = x * 2 + 1 박으면, 결과가 계산되고 저장돼. MLX 에선 결과가 계산을 묘사하는 graph node 야. 실제 숫자는 뭔가가 evaluation 강제할 때까지 계산 안 돼. 구체적 값 읽어야 하는 무엇이든 "implicit eval" 트리거 — array print, scalar 에 .item() 호출, Python list 로 변환. 다음 레슨이 이것에 바쳐져. 지금 takeaway 는 array 가 NumPy 처럼 보이 지만 계산이 일어나는 timing 이 다르다는 것.

2. Random API 호출 signature 가 다름. NumPy — np.random.randn(1024, 1024) — shape 을 별도 positional 인자로 전달. MLX — mx.random.normal((1024, 1024)) — shape 을 tuple 로 전달. uniform, randint 등에도 같이 적용. Tuple 잊으면 혼란스러운 에러 — 명백한 "shape 은 tuple 이어야 함" 이 아니라 인자 수에 대한 덜 도움 되는 불평.

그게 사실 다야

그게 표시할 가치 있는 두 반사. 다른 모든 건 대부분 carry over. NumPy 코드처럼 느껴지는 거 자기가 치고 있으면, 아마 valid MLX 코드 박는 중. 걸리는 때는 거의 항상 lazy-eval timing 또는 random-API tuple. 이 레슨 표시해 두고 그 중 하나가 너 물면 돌아와.

Code

뭐가 carry over — 모든 거 알아봐·python

import mlx.core as mx

a = mx.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
print('a       :', a)
print('a[0]    :', a[0])
print('a[:, 1] :', a[:, 1])
print('a.sum() :', a.sum())
print('a.sum(axis=0):', a.sum(axis=0))
print('a.mean():', a.mean())
print('a.argmax(axis=1):', a.argmax(axis=1))

# Broadcasting (shape (3,) broadcasts against shape (2, 3))
b = mx.array([10.0, 20.0, 30.0])
print('a + b   :', a + b)

# Reshape
print('a.reshape(3,2):', a.reshape(3, 2))

# In-place via item assignment — works fine in mlx 0.31.x
a[0, 0] = 99.0
print('after a[0,0]=99 :', a)

Divergence 1 — 기본으로 lazy (더 자세히는 lesson 4)·python

import mlx.core as mx

x = mx.array([1.0, 2.0, 3.0])
y = x * 2 + 1                  # NumPy would compute now; MLX records a graph
# At this point y is a graph node, not concrete numbers.
print('y :', y)                 # ← print() triggers implicit eval; you see [3, 5, 7]

# To force computation explicitly, without a print:
big = mx.random.normal((1024, 1024))
big_squared = big @ big.T       # lazy — no work done yet
mx.eval(big_squared)            # NOW the kernel runs
print('big_squared.sum():', float(big_squared.sum()))

Divergence 2 — random API 가 shape tuple 받음·python

import mlx.core as mx

# NumPy:  np.random.randn(3, 4)
# MLX:    mx.random.normal((3, 4))    — shape is a single tuple argument

x = mx.random.normal((3, 4))
print('normal(3,4)  shape:', x.shape, 'dtype:', x.dtype)

u = mx.random.uniform(low=0.0, high=1.0, shape=(2, 3))
print('uniform(2,3) shape:', u.shape)

# Reproducibility: seed once at the top of your script
mx.random.seed(42)
print('seeded sample:', mx.random.normal((3,)))

Exercise

전에 박았던 NumPy 스니펫 잡아 — 머리 위에서 기억나는 거 아무거나, 열 줄 정도 — 그리고 np 대신 mx 로 같은 거 쳐서 MLX 로 번역. 벽에 부딪히면, 거의 확실히 두 divergence (lazy 또는 random-tuple) 중 하나야. 번역에 얼마나 걸리고 hit 중 몇 개가 두 divergence vs 다른 거였는지 시간 재. 놀란 거 두 문장.