Local vs cloud vision

Local vision이 이기는 곳

Privacy 중요. 의료 영상, 법적 문서, 감시 영상, 직원 데이터 — 이런 거 보통 클라우드 못 가. Local이 유일한 옵션.
Bulk batch. "제품 사진 5만 장 처리"가 local에선 공짜, 클라우드에선 비쌈.
Offline. 현장 작업, air-gap 사이트, edge deploy.
Latency 민감 UI. Local vision은 1–2초에 답하는데 클라우드 round-trip은 3–5+초 추가.

Cloud가 아직 이기는 곳

이미지에 대한 가장 어려운 reasoning. "이 다이어그램 읽고 아키텍처 설명" — Claude 3.5 Sonnet이랑 GPT-4V가 아직 앞.
매우 고해상도. 클라우드 모델이 4K+ 이미지 더 우아하게 처리.
Frontier 공간 reasoning. Counting, occlusion, 상대 위치 — 클라우드가 더 좋음.

피파 패턴

피파가 local vision (Gemma 3 / Qwen 2.5-VL)을 routine 이미지 작업 (avatar 생성 prompt, screenshot 디버깅, 영수증 OCR)에 쓰고 가장 어려운 케이스엔 Claude vision으로 fallback. Local-first에 cloud-fallback이 vision에도 텍스트랑 동일하게 적용.

Code

Local-first vision router·python

async def describe_image_smart(path: str, complexity: str = "auto") -> str:
    """Local 먼저; 어려운 케이스는 cloud로 escalate."""
    # Step 1: local
    local_answer = await ollama_vision(path,
                                       "Describe this image in detail.")

    # Step 2: confidence 체크 (heuristic — local 답이 너무 짧거나
    # hedge 어구 포함하면 escalate)
    if complexity == "always_cloud" or _is_low_confidence(local_answer):
        return await claude_vision(path,
                                   "Describe this image in detail.")
    return local_answer

def _is_low_confidence(text: str) -> bool:
    return (len(text) < 80
            or any(h in text.lower() for h in [
                "i can't", "unable to", "not sure", "hard to tell",
            ]))

Local vision이 이기는 곳

Cloud가 아직 이기는 곳

피파 패턴

Code

External links

Exercise

Progress

댓글 0