Vercel AI SDK chatbot tutorial: useChat, streaming, real patterns
A working production-grade chatbot built on Vercel AI SDK v6. Streaming with useChat, tool calls, persistence, and the patterns that hold up after the demo.
- how-to
- vercel-ai-sdk
- tutorial
- react
The 30-line chatbot demo on the AI SDK landing page is great for a tweet and bad for production. In our experience shipping chatbots on Vercel infrastructure, the pieces missing — message persistence, error states, tool calls, observability, the API key not being on the client — are the parts that make the difference between a demo and a chatbot you can ship.
The Vercel AI SDK is a TypeScript library for building AI-powered apps; v6 is the current major version. useChat is the React hook that handles streaming, optimistic updates, and reconnection. streamText is the server-side primitive that calls a model and returns a typed stream of events.
This post builds the production version. By the end you'll have a chatbot in app/chat/page.tsx that streams responses, calls tools, persists messages across reloads, and routes through a gateway so you can swap models without changing code. We'll use Vercel AI SDK v6 (current as of May 2026), the useChat hook, and a single /api/chat route.
Working repo at the end if you want to skip ahead.
The architecture in one diagram
Browser Server Upstream
───────── ────── ────────
useChat() ──POST /api/chat──→ streamText() ──POST──→ gateway → model
↑ │ │
│ ←──SSE stream───── │ │
↑ │ │
DB ←── persist on `onFinish` ────────┤ │
↑ │ │
│ ←───SSE────────────────────────────Three components: a React hook on the client, a streaming route on the server, and a model call (we use Vercel AI Gateway via plain provider/model strings, but you can swap that for a direct provider SDK if you don't want a gateway).
Step 1 — install
npm install ai @ai-sdk/react
# Plus AI Elements for the UI components — saves 200 lines of styling
npx ai-elements@latest add conversation messageIf you're not using Next.js, the SDK works with any React framework that supports SSE — Remix, TanStack Start, Astro with islands. The patterns transfer.
Step 2 — the server route
Create app/api/chat/route.ts:
import { streamText, type ModelMessage } from "ai"
export const maxDuration = 60 // streaming can take a while
export async function POST(req: Request) {
const { messages }: { messages: ModelMessage[] } = await req.json()
const result = streamText({
// "provider/model" strings route through AI Gateway by default.
// No SDK install per provider. Swap the model id to swap providers.
model: "anthropic/claude-sonnet-4-6",
system:
"You're a helpful assistant. Reply in plain text. " +
"If you don't know, say so plainly. No filler.",
messages,
// Capture token usage for the cost ledger.
onFinish({ text, usage }) {
// fire-and-forget — don't block the stream on persistence
void persistMessage({ text, usage })
},
})
return result.toDataStreamResponse()
}
async function persistMessage(_args: { text: string; usage: unknown }) {
// your DB write here
}A few details that matter:
The model field accepts a "provider/model" string. Provider/model strings are how AI Gateway exposes its catalog without naming collisions — openai/gpt-5.4-mini, anthropic/claude-sonnet-4-6, google/gemini-2.5-pro. The gateway looks up the credentials and routes the request. Your server never sees provider keys.
The onFinish callback fires after the stream completes. It runs after the response has already been streamed to the client, so persistence latency doesn't show up in the user's perceived response time. This is the right place to write to your database, log to your observability stack, and update token-count ledgers.
maxDuration = 60 matters on Vercel — the default function timeout is shorter than some long generations need. Set it explicitly. (You can also enable Fluid Compute for streaming endpoints; it reuses warm instances and dodges cold starts.)
toDataStreamResponse() returns a Response whose body is the AI SDK's typed stream — JSON-encoded events the client SDK knows how to parse. Don't try to write your own SSE format unless you're forking the SDK; the typed stream supports tool calls, errors, and metadata that plain text doesn't.
Step 3 — the client
Create app/chat/page.tsx:
"use client"
import { useChat } from "@ai-sdk/react"
import { Conversation, ConversationContent, ConversationScrollButton } from "@/components/ai-elements/conversation"
import { Message, MessageContent } from "@/components/ai-elements/message"
export default function ChatPage() {
const { messages, input, handleInputChange, handleSubmit, status, error, stop } =
useChat({
api: "/api/chat",
// Reload the last 50 messages on mount — server returns from DB
initialMessages: undefined,
})
return (
<main className="mx-auto flex h-svh max-w-2xl flex-col">
<Conversation>
<ConversationContent>
{messages.map((m) => (
<Message key={m.id} from={m.role}>
<MessageContent>{m.content}</MessageContent>
</Message>
))}
{error && (
<p className="px-4 py-2 text-sm text-red-600">
{error.message}. <button onClick={() => stop()}>Cancel</button>
</p>
)}
</ConversationContent>
<ConversationScrollButton />
</Conversation>
<form onSubmit={handleSubmit} className="border-t p-4">
<input
value={input}
onChange={handleInputChange}
placeholder="Ask anything"
className="w-full rounded border px-3 py-2"
disabled={status === "streaming"}
/>
</form>
</main>
)
}The useChat hook handles SSE parsing, optimistic updates, and reconnect. You don't write any of that.
status is one of "submitted" | "streaming" | "ready" | "error". Use it to disable the input mid-stream and show a thinking indicator. The stop() callback aborts the in-flight request — bind it to a button or to escape-key.
The Vercel-published AI Elements components (Conversation, Message, etc.) are shadcn-derived, so they live in your repo and you customize them like any shadcn component. Worth using over rolling your own — the streaming-specific edge cases (auto-scroll, partial tokens, code-block rendering during streaming) are nontrivial to get right.
Step 4 — message persistence (so reload doesn't wipe history)
Two paths. Pick one based on whether your messages need to survive across devices.
Local-only (sessionStorage / IndexedDB): useChat exposes setMessages, so you can hydrate from local storage on mount. Fine for a personal assistant; bad for "log in on a different device and see the history."
Server-persisted: The pattern that works:
// In your /api/chat route, after streamText:
return result.toDataStreamResponse({
async onFinish({ messages }) {
await db.insert(chatMessages).values(messages)
},
})
// In a separate /api/chat/[id]/route.ts:
export async function GET(req: Request, { params }: { params: { id: string } }) {
const stored = await db.select().from(chatMessages).where(eq(chatMessages.threadId, params.id))
return Response.json({ messages: stored })
}Then on the client, fetch the history before mounting useChat and pass it as initialMessages. Don't try to load history inside useChat — the hook expects a stable initial state.
Step 5 — tool calls
A chatbot that can't take actions is a fancy autocomplete. Adding tool calls in v6:
import { streamText, tool } from "ai"
import { z } from "zod"
const result = streamText({
model: "anthropic/claude-sonnet-4-6",
messages,
tools: {
getWeather: tool({
description: "Look up the current weather for a city.",
inputSchema: z.object({
city: z.string().describe("city name, e.g. 'Tokyo'"),
}),
execute: async ({ city }) => {
const res = await fetch(`https://wttr.in/${encodeURIComponent(city)}?format=j1`)
const data = await res.json()
return { temp_c: data.current_condition[0].temp_C, desc: data.current_condition[0].weatherDesc[0].value }
},
}),
},
// Allow up to 5 tool-call cycles before giving up
maxSteps: 5,
})maxSteps is the cap on the agent loop. An agent loop is the cycle of: model decides to call a tool → server executes the tool → result is streamed back to the model → model decides again. Without maxSteps, a hallucinating model could call the same tool forever. We set 5 because anything more is usually a sign the prompt is wrong.
The client's useChat handles the tool-call rendering automatically — the Message component from AI Elements has a <ToolCall> slot that renders the tool name, arguments, and result inline. You don't write display code per tool.
A real tool-call gotcha: Claude prefers serial tool calls; GPT happily parallelizes. If your prompt depends on a specific call ordering, test on both. The OpenAI-compatible interface normalizes the shape but not the behavior. The migration post goes deeper on this.
Step 6 — error states that don't ruin the UX
Three error categories users actually hit:
- Network errors (request fails, connection drops mid-stream). The
errorfield fromuseChatpopulates. Show a retry button bound toreload(). - Rate limits (429 from upstream). The provider returns a structured error. Surface it: "We're rate-limited right now. Retrying in 30s." Use a backoff in the route handler — Anthropic and OpenAI both accept retries cleanly. The OpenAI rate limit cookbook covers exponential backoff well.
- Refusals (model declines to answer). Looks like a normal completion that says "I can't help with that." The handling is product-level: explain to the user what the policy is, suggest a rephrasing.
{error && (
<div className="rounded border-l-2 border-amber-500 bg-amber-50 p-3 text-sm">
{error.message.includes("rate") ? (
<>Slowed down for a moment. Retrying automatically.</>
) : (
<>Something went wrong. <button onClick={() => reload()}>Try again</button></>
)}
</div>
)}Step 7 — cost and observability
If you don't track per-request cost, you'll find out about runaway prompts on the credit-card statement. The onFinish hook receives usage with promptTokens, completionTokens, and totalTokens. Multiply by the per-million-token rate from your gateway. Persist alongside the message so you can drill down later.
A minimal cost-tracking schema:
CREATE TABLE chat_messages (
id text primary key,
thread_id text not null,
role text not null check (role in ('user','assistant','tool')),
content jsonb not null,
model text,
prompt_tokens integer,
completion_tokens integer,
cost_usd numeric(10,6),
created_at timestamptz default now()
);That gives you a cost-per-thread query in one line. Without it, you're guessing.
For deeper observability — full request traces, prompt versioning, A/B testing — layer something like Helicone or LangSmith on top. The gateway log shows you the request body and timing; those tools index it for search.
What this leaves out
This tutorial covers the chatbot. It doesn't cover:
- Multi-modal (image/audio inputs). Add
experimental_attachmentstouseChatand passfileparts in the message content. Same shape, more pieces. - Generative UI (streaming React components, not just text). Use
streamUIinstead ofstreamText, or the RSC GenUI template. - Memory across sessions. That's a retrieval problem; the chatbot is the front-end.
- Authentication on the API route. Add it. Production routes without auth get billed by everyone on the internet within hours of going live.
How this fits with a gateway
We tested this exact pattern routed through three gateways: direct OpenAI, Vercel AI Gateway, and Synapse Garden. The code in this post is identical across all three — only the model string and the underlying credential change. That's the whole point of the SDK's provider abstraction. If you want the gateway-side context, the gateway comparison covers what each adds underneath; for governance specifically, the per-project keys post covers why the right credential matters.
When to use the AI SDK vs roll your own
Use the SDK when:
- You want streaming, tool calls, and provider abstraction without writing 500 lines of SSE-parsing code.
- Your app is React-based; the hooks are Next.js/React-specific.
- You're OK pinning to v6 and following the upgrade path.
Skip the SDK when:
- You're not on React (Vue/Svelte/etc. — the core
streamTextworks server-side, but the UI hooks don't). - You need behavior the SDK doesn't expose (raw provider parameters, custom event types). Direct SDK calls give you more control.
For the 80% case the SDK is the right call. The amount of code it saves on the streaming + reconnect + tool-rendering side is meaningful.
Working repo
A complete reference implementation including persistence, tool calls, error states, and cost tracking lives at github.com/ayush-pinnacle/synapse-garden-chatbot-example. Clone, swap in your ANTHROPIC_API_KEY (or your Synapse Garden mg_live_* for the gateway path), and you have a working chatbot.
If you're routing through us specifically, set baseURL: "https://synapse.garden/api/v1" and use any model from the catalog. The AI SDK docs cover the full surface.
For the next step — adding agent loops, RAG, or model routing rules — see the function-calling post and the architecture notes in the docs.
Synapse Publication
Field notes, technical write-ups, and benchmarks from the team building Synapse Garden.
- Deep dive
Vercel AI Elements: 20+ React components for AI apps explained
A walk-through of every AI Elements component, what each one solves, and where rolling your own still wins. Practical patterns, real composition.
- Comparison
OpenRouter alternatives in 2026: pricing, routing, self-host
A working comparison of LLM gateways that compete with OpenRouter. What each one is good at, the pricing catches nobody mentions, and how to actually pick.