Edge Deployment — ExecuTorch, CoreML, MLX

GPU 떠나는 세 path

server inference 가 한 deployment target. mobile 과 edge 가 매우 다름. PyTorch 생태계에 dedicated 도구:

ExecuTorch — PyTorch 의 mobile / edge runtime. iOS, Android, microcontroller 타겟. 옛 PyTorch Mobile 후계자.
CoreML — Apple 의 on-device ML framework. iOS / macOS 위 max 성능. coremltools 다리 통해 PyTorch 에서 변환.
MLX — Apple Silicon 위 Apple 의 native ML framework. unified memory architecture 둘러 짓기. Mac 또는 iPhone 의 마지막 한 비트 짜낼 때 옳은 선택.

흐름

ExecuTorch 와 CoreML 의 modern path 는 같음: torch.export → backend-specific lowering. ExecuTorch 가 너 app 과 ship 하는 .pte 파일로 lowering. CoreML 이 .mlpackage 로.

MLX 위 두 옵션: PyTorch weight 를 MLX format 으로 변환 (많은 architecture 작동) 또는 model 을 MLX 직접 다시 짜기 (best perf, 근데 porting 노력).

Code

ExecuTorch — mobile 위 export·python

# pip install executorch
import torch
from executorch.exir import to_edge

class TinyMLP(torch.nn.Module):
    def __init__(self): super().__init__(); self.fc = torch.nn.Linear(10, 4)
    def forward(self, x): return self.fc(x)

model = TinyMLP().eval()
example = torch.randn(1, 10)

# 1. Export with torch.export
exported = torch.export.export(model, (example,))

# 2. Lower to ExecuTorch's edge IR
edge = to_edge(exported)

# 3. Optimize and serialize
et_program = edge.to_executorch()
with open('/tmp/tiny.pte', 'wb') as f:
    f.write(et_program.buffer)

# .pte ships with your iOS / Android app

CoreML — Apple device deployment·python

# pip install coremltools
import torch
import coremltools as ct

class TinyMLP(torch.nn.Module):
    def __init__(self): super().__init__(); self.fc = torch.nn.Linear(10, 4)
    def forward(self, x): return self.fc(x)

model = TinyMLP().eval()
example = torch.randn(1, 10)

# Trace the model (CoreML's converter still uses tracing under the hood)
traced = torch.jit.trace(model, example)

mlmodel = ct.convert(
    traced,
    inputs=[ct.TensorType(shape=example.shape, name='x')],
    convert_to='mlprogram',                # modern MLProgram format
    minimum_deployment_target=ct.target.macOS14,
)
mlmodel.save('/tmp/tiny.mlpackage')

# Drop the .mlpackage into Xcode and you have a CoreML model

MLX — native Apple Silicon, 두 path·python

# pip install mlx mlx-lm
import mlx.core as mx
import mlx.nn as nn

# Path 1: re-implement in MLX (best performance)
class MLXMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 4)
    def __call__(self, x):
        return self.fc(x)

model = MLXMLP()
x = mx.random.normal((1, 10))
y = model(x)                                  # eager-style execution
mx.eval(y)                                     # force evaluation
print(y.shape)                                 # (1, 4)

# Path 2: load PyTorch weights into MLX
# Many community projects (mlx_lm) support direct loading of HF checkpoints.
# from mlx_lm import load
# model, tokenizer = load("mlx-community/Llama-3.2-3B-Instruct-4bit")

hardware 별 deployment target 고르기·python

# A quick decision table:
#
# Target               | Recommended path
# --------------------- | ----------------------------------------------------
# iOS / iPadOS         | CoreML (best Apple integration) or ExecuTorch
# Android              | ExecuTorch (with NNAPI / Vulkan delegate)
# macOS (Apple Silicon)| MLX (native) or CoreML
# Linux server (GPU)   | torch.compile + bf16, or vLLM for LLMs
# Linux server (CPU)   | torch.compile + ONNX Runtime, OpenVINO
# NVIDIA Jetson / edge | TensorRT (via ONNX export)
# Browser              | ONNX Runtime Web, transformers.js

Edge Deployment — ExecuTorch, CoreML, MLX

GPU 떠나는 세 path

흐름

Code

External links

Exercise

Progress

댓글 0