TFLiteConverter와 post-training quantization

~13 min · converter, quantization, int8

Level 0Level 0

0 XP0/78 lessons0/17 achievements

0/100 XP to next level100 XP to go0% complete

Model을 4–10배 줄이는 세 가지 방법

tf.lite.TFLiteConverter에 세 factory method: from_saved_model (권장), from_keras_model, from_concrete_functions. SavedModel 경로가 가장 완전하고 최적화된 변환.

3단계 quantization tier, 갈수록 더 작고 빨라:

없음 (float32 baseline) — 직접 변환, 크기 감소 없음. 원본과 같은 정확도.
Dynamic range (optimizations=[tf.lite.Optimize.DEFAULT]) — weight를 정적으로 int8, activation은 runtime에 quantize. ~4× 작아짐, CPU 2–3× 빠름. Calibration 데이터 불필요. 가장 쉬운 첫 단계.
Full integer (int8) — weight랑 activation 둘 다 int8. Activation 범위 calibration 위한 representative dataset (100–200 샘플) 필요. ~4× 작아짐, 3×+ 빠름. Edge TPU와 microcontroller에 필수.

Code

Three quantization tiers·python

import tensorflow as tf
import numpy as np

# Tier 1: baseline (no quantization)
conv1 = tf.lite.TFLiteConverter.from_saved_model('my_saved_model/')
tflite_baseline = conv1.convert()

# Tier 2: dynamic range — easiest, no calibration data
conv2 = tf.lite.TFLiteConverter.from_saved_model('my_saved_model/')
conv2.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_dynamic = conv2.convert()

# Tier 3: full integer — requires calibration dataset
def representative_dataset():
    # Use REAL training data here, NOT random noise
    for sample in train_samples[:200]:
        yield [sample[None].astype(np.float32)]

conv3 = tf.lite.TFLiteConverter.from_saved_model('my_saved_model/')
conv3.optimizations = [tf.lite.Optimize.DEFAULT]
conv3.representative_dataset = representative_dataset
conv3.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
conv3.inference_input_type = tf.int8       # for Edge TPU / MCU
conv3.inference_output_type = tf.int8
tflite_int8 = conv3.convert()

for name, model_bytes in [('baseline', tflite_baseline),
                           ('dynamic', tflite_dynamic),
                           ('int8', tflite_int8)]:
    print(f"{name}: {len(model_bytes) / 1024:.1f} KB")

Progress

Progress is local-only — sign in to sync across devices.

← PreviousTFLite가 뭐야 — 그리고 왜 LiteRT로 바뀌고 있나 Next →Quantization-aware training과 pruning

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.