Streaming
Stream model output token-by-token. Server-Sent Events under the hood, idiomatic AI SDK on top.
Every chat, message, and response endpoint streams. The wire format is OpenAI-style Server-Sent Events (SSE) — text/event-stream with one event per \n\n block.
The AI SDK gives you streamText and streamObject — both wrap the SSE protocol and expose strongly-typed async iterators. See the AI SDK streamText reference for the full surface.
With the AI SDK (recommended)
The AI SDK gives you streamText and streamObject. Both return an awaitable result with iterators for tokens, parts, tool calls, and a final usage promise.
import { streamText } from "ai"
const result = streamText({
model: "openai/gpt-5.4",
baseURL: "https://synapse.garden/api/v1",
apiKey: process.env.MG_KEY,
prompt: "Write three bullet points about Postgres partitioning.",
})
for await (const part of result.textStream) {
process.stdout.write(part)
}
console.log("\n— Usage:", await result.usage)
console.log("— Finish reason:", await result.finishReason)
console.log("— Provider:", await result.providerMetadata)result.textStream yields plain text deltas. For structured streams (tool calls, reasoning summaries, multi-part output), use result.fullStream which yields a discriminated union of part kinds.
First token typically arrives in 200–800ms depending on the model. The total cost is identical to a single-shot call — what changes is perceived latency.
With the OpenAI SDK
import OpenAI from "openai"
const client = new OpenAI({
apiKey: process.env.MG_KEY,
baseURL: "https://synapse.garden/api/v1",
})
const stream = await client.chat.completions.create({
model: "openai/gpt-5.4",
messages: [{ role: "user", content: "Tell me about Synapse Garden." }],
stream: true,
})
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "")
}Wire format (raw SSE)
If you're hand-rolling, the events look like this:
data: {"id":"chatcmpl_…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":"Post"},"finish_reason":null}]}
data: {"id":"chatcmpl_…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"gres"},"finish_reason":null}]}
…
data: {"id":"chatcmpl_…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Three rules:
- One event per
\n\nblock; lines inside an event are prefixed withdata:. - Each
data:payload is JSON, except the terminatingdata: [DONE]. - The final event before
[DONE]carriesfinish_reason(stop,length,tool_calls,content_filter).
Anthropic-style streaming
The /v1/messages endpoint streams using the Anthropic event protocol:
event: message_start
data: {"type":"message_start","message":{...}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Post"}}
…
event: message_stop
data: {"type":"message_stop"}Use the Anthropic SDK and let it parse for you — the wire shape only matters when you're debugging.
Cancellation
The AI SDK accepts an AbortSignal. Cancel mid-stream and we stop charging tokens after the cancellation propagates (a few hundred ms).
const controller = new AbortController()
setTimeout(() => controller.abort(), 1500) // cancel after 1.5s
const result = streamText({
model: "openai/gpt-5.4",
prompt: "…",
abortSignal: controller.signal,
})
try {
for await (const part of result.textStream) {
process.stdout.write(part)
}
} catch (err) {
if (err.name === "AbortError") console.log("\n(cancelled)")
else throw err
}For raw fetch, pass the same signal and the connection closes immediately.
Edge / serverless caveats
Streaming responses pass through serverless function bodies just fine, but CDN buffering can break SSE if the platform queues bytes. We send X-Accel-Buffering: no and Cache-Control: no-cache, no-transform to tell common proxies to flush each event.
If you're piping our stream through your own function, do the same:
return new Response(upstream.body, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache, no-transform",
"X-Accel-Buffering": "no",
},
})Errors mid-stream
If the upstream model errors after some tokens have flowed, we send a final SSE event with the error envelope, then [DONE]:
data: {"error":{"code":"UPSTREAM_ERROR","message":"Provider returned 500"}}
data: [DONE]The AI SDK throws on these automatically. With raw fetch, parse each data: line and check for an error field before accumulating content.
Time-to-first-token (TTFT)
Streaming TTFT is the most common latency metric for conversational UX. We surface per-provider TTFT on the model detail pages so you can pick a fast variant when you need it.
const t0 = performance.now()
const result = streamText({ model: "openai/gpt-5.4", prompt: "…" })
let firstToken: number | null = null
for await (const part of result.textStream) {
if (firstToken === null) firstToken = performance.now() - t0
process.stdout.write(part)
}
console.log(`\nTTFT: ${firstToken?.toFixed(0)}ms`)Typical TTFT ranges:
| Model class | TTFT (P50) |
|---|---|
Edge / nano (gpt-5.4-nano, gemini-2.5-nano) | 80–200 ms |
Fast workhorse (gpt-5.4-mini, claude-haiku-4-5) | 200–400 ms |
Flagship (gpt-5.4, claude-sonnet-4.6, gemini-2.5-pro) | 400–800 ms |
Reasoning models (with reasoningEffort: 'high') | 1.5–8s before first token |