C.W.K.
Lesson 04 of 05 · published

RAG as Cross-Cutting Service

~11 min · rag, service

Level 0Curious
0 XP0/52 lessons0/16 achievements
0/100 XP to next level100 XP to go0% complete

One function

The RAG service is a single function: build_rag_context(query) -> str | None. It was designed for Claude-Pippa's chat route first — no thought given to variants. Each variant route adds a one-liner: rag_context = await build_rag_context(prompt_text). The function doesn't know which brain will consume it.

Cross-cutting without cross-contamination

RAG is shared because the underlying operation (semantic search over ChromaDB) is genuinely the same regardless of brain. The service is shared; the routes are still independent.

Graceful degradation

If Ollama isn't running, build_rag_context returns None. The route proceeds without retrieved context. Better degraded chat than a hard error.

Code

RAG service — one function, all brains call it·python
async def build_rag_context(query: str, k: int = 5) -> str | None:
    if not await ollama_alive():
        return None
    embedding = await embed(query)
    hits = await chroma.query(
        collection='vault',
        embedding=embedding,
        n_results=k,
    )
    if not hits:
        return None
    return format_for_system_prompt(hits)

Progress

Progress is local-only — sign in to sync across devices.