Token Tracking 과 Translation

매 호출 track

호출당 토큰 카운트 log 안 하면 "오늘 이 user 가 우리 비용 얼마?" "이 feature 가 청구서 폭파시키나?" 같은 단순 질문 답 X. 첫날 tracker 빌드 — 사실 후 retrofit 이 고통.

뭐 기록

매 호출에: 모델, prompt 토큰, completion 토큰, cached 토큰, finish reason, timestamp, route key (어느 feature 가 호출). 작은 DB 또는 JSONL append — 둘 다 대부분 앱 볼륨에 동작.

OpenAI ↔ Gemini 메시지 번역

Legacy OpenAI 모양 코드 또는 OpenAI 모양 메시지 받는 adapter 있으면 OpenAI → Gemini 변환 여러 번 작성. 세 변환 중요:

role: "system" → top-level system_instruction field, 메시지 X.
role: "assistant" → contents 에 role: "model".
tool_call_id 의 role: "tool" → id 가진 user-turn functionResponse part.

Code

Token tracker — 작은 버전·python

from dataclasses import dataclass, field
from collections import defaultdict

@dataclass
class TokenTracker:
    by_model: dict = field(default_factory=lambda: defaultdict(
        lambda: {'prompt': 0, 'completion': 0, 'cached': 0, 'calls': 0}))

    def record(self, model: str, usage):
        bucket = self.by_model[model]
        bucket['prompt']     += getattr(usage, 'prompt_token_count', 0) or 0
        bucket['completion'] += getattr(usage, 'candidates_token_count', 0) or 0
        bucket['cached']     += getattr(usage, 'cached_content_token_count', 0) or 0
        bucket['calls']      += 1

    def estimate_cost_usd(self) -> float:
        rates = {  # USD per 1M tokens, simplified
            'gemini-2.5-pro':        {'in': 1.25, 'out': 10.00, 'cached': 0.125},
            'gemini-2.5-flash':      {'in': 0.30, 'out':  2.50, 'cached': 0.03},
            'gemini-2.5-flash-lite': {'in': 0.10, 'out':  0.40, 'cached': 0.0},
        }
        total = 0.0
        for model, b in self.by_model.items():
            r = rates.get(model, rates['gemini-2.5-flash'])
            uncached = b['prompt'] - b['cached']
            total += (uncached / 1e6) * r['in']
            total += (b['cached']  / 1e6) * r['cached']
            total += (b['completion'] / 1e6) * r['out']
        return total

tracker = TokenTracker()
# After each Gemini call:
tracker.record('gemini-2.5-flash', response.usage_metadata)
print(f'Spent so far: ${tracker.estimate_cost_usd():.4f}')

OpenAI → Gemini 메시지 번역·python

def openai_to_gemini(messages):
    """Convert OpenAI-style messages list -> (Gemini contents, system_instruction)."""
    system = None
    contents = []
    for msg in messages:
        role = msg['role']
        if role == 'system':
            system = msg['content']
        elif role == 'user':
            contents.append({
                'role':  'user',
                'parts': [{'text': msg['content']}],
            })
        elif role == 'assistant':
            # OpenAI's 'assistant' becomes Gemini's 'model'
            contents.append({
                'role':  'model',
                'parts': [{'text': msg['content']}],
            })
        elif role == 'tool':
            # OpenAI's 'tool' becomes Gemini's user-turn functionResponse
            contents.append({
                'role':  'user',
                'parts': [{
                    'functionResponse': {
                        'name':     msg.get('name', ''),
                        'id':       msg.get('tool_call_id', ''),
                        'response': {'result': msg['content']},
                    }
                }],
            })
    return contents, system

Adapter 에서 번역 사용·python

from google import genai
from google.genai import types

async def call_via_openai_shape(messages):
    contents, system = openai_to_gemini(messages)
    config = types.GenerateContentConfig(
        system_instruction=system,
    ) if system else None

    response = await client.aio.models.generate_content(
        model='gemini-2.5-flash',
        contents=contents,
        config=config,
    )
    return response.text

# Same caller code as OpenAI:
text = await call_via_openai_shape([
    {'role': 'system', 'content': 'You are helpful.'},
    {'role': 'user',   'content': 'Hello!'},
])

Exercise

TokenTracker 를 너 GeminiAdapter 에 wire 해서 매 generate_stream 호출이 usage 기록. 50 mixed prompt 실행. Per-model breakdown 과 총 비용 출력. 그 다음 작은 보고서 작성: 어느 모델이 비용 dominated, 어느 게 토큰 볼륨 dominated, 처음 시도할 최적화 한 개.