REST와 gRPC API

두 프로토콜, 두 용도

TF Serving은 REST (8501)와 gRPC (8500) 둘 다 노출. REST가 가장 쉬움 — 표준 HTTP JSON 요청, 어디서든 통합. gRPC가 더 빠름 — 바이너리 Protocol Buffer 직렬화, 낮은 지연시간, 고용량 production 트래픽에 낮은 bandwidth.

REST URL 패턴: http://host:8501/v1/models/{MODEL_NAME}:predict. 특정 버전 타깃하려면 /versions/{N} 추가.

Python 클라이언트에선 REST는 두 줄 (requests.post). gRPC는 tensorflow-serving-api 패키지랑 Protocol Buffer 메시지 타입 필요.

언제 뭘 쓸지: REST는 개발, 디버깅, 저용량 production, 웹 프레임워크 통합. gRPC는 직렬화 overhead가 실제 지연시간 예산에 보이는 고처리량 production.

Code

REST — one line with curl·bash

# Check model status
curl http://localhost:8501/v1/models/my_model
# {"model_version_status": [{"version": "2", "state": "AVAILABLE", ...}]}

# Predict (row format — one instance per element)
curl -d '{"instances": [[1.0, 2.0, 3.0, 4.0]]}' \
  -X POST http://localhost:8501/v1/models/my_model:predict
# {"predictions": [[0.1, 0.8, 0.1, ...]]}

# Target a specific version
curl -d '{"instances": [[1.0, 2.0]]}' \
  -X POST http://localhost:8501/v1/models/my_model/versions/1:predict

Python REST 클라이언트·python

import requests
import json
import numpy as np

url = "http://localhost:8501/v1/models/my_classifier:predict"
input_data = np.random.rand(2, 784).tolist()
payload = json.dumps({"instances": input_data})
headers = {"content-type": "application/json"}

response = requests.post(url, data=payload, headers=headers)
predictions = response.json()['predictions']
print(f"Top class for sample 0: {np.argmax(predictions[0])}")

Python gRPC client — for production·python

import grpc
import numpy as np
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

# pip install tensorflow-serving-api

channel = grpc.insecure_channel('localhost:8500')
stub    = prediction_service_pb2_grpc.PredictionServiceStub(channel)

request = predict_pb2.PredictRequest()
request.model_spec.name           = 'my_classifier'
request.model_spec.signature_name = 'serving_default'

input_data = np.random.rand(2, 784).astype(np.float32)
request.inputs['keras_tensor'].CopyFrom(
    tf.make_tensor_proto(input_data, dtype=tf.float32))

result_future = stub.Predict.future(request, timeout_seconds=10.0)
result = result_future.result()
predictions = tf.make_ndarray(result.outputs['output_0'])
print("Predictions shape:", predictions.shape)

channel.close()

두 프로토콜, 두 용도

Code

Exercise

Progress

댓글 0