Observability

Request IDs, generation IDs, usage logs, latency telemetry. Trace every request from your code to the answer.

FIG.
FIG. 00 · OBSERVABILITYSPANS · LATENCY

Every request through Synapse Garden produces three correlatable IDs and a structured log row. Use them to debug a slow request, audit a user-reported answer, or attach token usage to your application traces — the AI SDK's streamText accepts a custom fetch so you can stamp your own X-Request-Id header and stitch our spans into your trace tree.

FIG. 01THREE IDS
SCHEMATIC
`X-Request-Id` is your handle on a single call. `requestId` ties it to your application trace. `generationId` (in `providerMetadata`) reaches into the upstream provider's logs. The dashboard renders the row, OTEL spans propagate the timing, and Sentry captures unhandled errors.

Three IDs per request

IDWhereUse for
X-Request-IdResponse header on every requestCorrelating with our internal logs
requestId (in your code)Generated by the SDK or by you, sent as X-Request-IdTying upstream calls to your application traces
generationIdproviderMetadata.gateway.generationId after the callLooking up provider-side traces (admin / debugging)
import { streamText } from "ai"

const result = streamText({
  model: "openai/gpt-5.4",
  prompt: "...",
  baseURL: "https://synapse.garden/api/v1",
  apiKey: process.env.MG_KEY,
  // The AI SDK doesn't expose request headers directly; use a fetch wrapper:
  fetch: async (url, init) => {
    const id = crypto.randomUUID()
    init = init ?? {}
    init.headers = { ...(init.headers ?? {}), "X-Request-Id": id }
    const res = await fetch(url, init)
    console.log("[mg] sent X-Request-Id =", id, "status:", res.status)
    return res
  },
})

const meta = await result.providerMetadata
console.log("[mg] generationId =", meta?.gateway?.generationId)

Per-request log row

Every request produces a row in request_logs (per docs/ERD.md §2.7) with:

{
  requestId: string,             // X-Request-Id (yours or generated)
  orgId, projectId, apiKeyId,    // governance scope
  model: string,                 // requested model
  endpoint: "chat.completions" | "messages" | "embeddings",
  status: "success" | "error" | "timeout" | "rate_limited" | "budget_exceeded",
  inputTokens, outputTokens,
  costPassthroughCents,
  costChargedCents,
  latencyMs,
  ttftMs,                        // time-to-first-token (streaming)
  errorCode, errorMessage,       // null on success
  createdAt
}

Visible in the dashboard at Usage → Recent requests, with filters for project, key, model, status, and date range. The full payload is not logged by default — see Privacy below.

Viewing requests in the dashboard

Dashboard → Usage → Recent requests shows the last 1,000 rows for your workspace. Click any row to see:

  • Full timing breakdown (TTFT, total latency, queue lag)
  • Token usage (input, output, cached)
  • Cost (passthrough + charged)
  • Provider that served it
  • Full request/response only if you've opted into payload logging at the workspace level
  • Curl reproduction snippet (we re-render your call as a curl line for easy debug)

Real-time SSE updates

The dashboard subscribes to a server-sent-events stream (/api/sse) for live usage updates. New rows appear in Recent requests within ~1 second of completion — useful when you're tail-debugging a problem.

OpenTelemetry integration

Synapse Garden emits OTel spans on every proxy call. If your app already uses OpenTelemetry, the spans plug into your existing trace tree:

import { trace } from "@opentelemetry/api"
import { generateText } from "ai"

const tracer = trace.getTracer("my-app")

await tracer.startActiveSpan("generate-summary", async (span) => {
  const requestId = span.spanContext().traceId

  const { text } = await generateText({
    model: "openai/gpt-5.4",
    prompt,
    fetch: async (url, init) => fetch(url, {
      ...init,
      headers: { ...init?.headers, "X-Request-Id": requestId },
    }),
  })

  span.end()
  return text
})

When you set X-Request-Id to your trace ID, our internal span propagates the same trace ID — your APM tool stitches the timeline together automatically.

Latency budget

We surface our own overhead separately from the upstream model time. Browse /internal/ops (admin only) for live numbers; targets per docs/ARCHITECTURE.md §3:

StageTarget P50
Header parse + key format check1 ms
Key validation (cache hit)3 ms
Body validation (Zod)2 ms
Model allowlist check<1 ms
Rate limit check2 ms
Token estimate + budget check3 ms
Synapse Garden total~12 ms
Upstream modelvaries
Logging0 ms (fire-and-forget)

Hot path latency is monitored with k6 in CI; if P95 overhead crosses 50 ms, the build fails.

Token usage tracking

In your app:

const { text, usage } = await generateText({ model: "...", prompt })
console.log(usage)
// { promptTokens: 234, completionTokens: 156, totalTokens: 390 }

For streaming:

const result = streamText({ model: "...", prompt })
for await (const part of result.textStream) process.stdout.write(part)
const usage = await result.usage // resolves at end of stream

Aggregate token usage in your own metrics:

metrics.recordHistogram("llm.tokens.input", usage.promptTokens, { model })
metrics.recordHistogram("llm.tokens.output", usage.completionTokens, { model })

Webhooks (v2)

We're shipping outbound webhooks in v2 — subscribe a URL to per-request events (request.completed, budget.exceeded, key.revoked). Useful for piping Synapse Garden activity into your own data warehouse without polling.

Privacy

By default, no prompt or completion content is logged. We only log:

  • Request metadata (timing, status, tokens, cost)
  • Model and endpoint
  • API key prefix (not full key)
  • IP address (for abuse detection; truncated to /24 after 7 days)

Opt-in payload logging is available at the workspace level for teams that need full request/response audit trails for compliance. PII redaction runs automatically on opt-in payloads.

Status & uptime

  • Status page: status.synapse.garden — real-time component health, incident history, scheduled maintenance.
  • API health: GET /api/health returns { db, redis, queue, status } for synthetic monitoring.
  • Per-provider uptime: see model detail pages on /models/[creator]/[slug] for live provider availability.

What we monitor

MetricAlert threshold
Proxy P95 overhead>50 ms for 5 min
Key cache hit rate<99% for 10 min
/v1/* 5xx rate>1% for 5 min
Upstream error rate>3% for 5 min
Webhook lag>30 s
Queue backlog>1,000
Daily token spend3× rolling 7-day avg (anomaly)

Sentry captures unhandled exceptions; alerts route to on-call. Full incident response process documented at /legal/security.