Vercel AI SDK chatbot tutorial: useChat, streaming, real patterns

The 30-line chatbot demo on the AI SDK landing page is great for a tweet and bad for production. In our experience shipping chatbots on Vercel infrastructure, the pieces missing — message persistence, error states, tool calls, observability, the API key not being on the client — are the parts that make the difference between a demo and a chatbot you can ship.

The Vercel AI SDK is a TypeScript library for building AI-powered apps; v6 is the current major version. useChat is the React hook that handles streaming, optimistic updates, and reconnection. streamText is the server-side primitive that calls a model and returns a typed stream of events.

This post builds the production version. By the end you'll have a chatbot in app/chat/page.tsx that streams responses, calls tools, persists messages across reloads, and routes through a gateway so you can swap models without changing code. We'll use Vercel AI SDK v6 (current as of May 2026), the useChat hook, and a single /api/chat route.

Working repo at the end if you want to skip ahead.

The architecture in one diagram

Browser                          Server                       Upstream
─────────                        ──────                       ────────
useChat()  ──POST /api/chat──→   streamText()  ──POST──→     gateway → model
   ↑                                  │                            │
   │              ←──SSE stream─────  │                            │
   ↑                                  │                            │
DB ←── persist on `onFinish` ────────┤                            │
   ↑                                  │                            │
   │                            ←───SSE────────────────────────────

Three components: a React hook on the client, a streaming route on the server, and a model call (we use Vercel AI Gateway via plain provider/model strings, but you can swap that for a direct provider SDK if you don't want a gateway).

Step 1 — install

npm install ai @ai-sdk/react
# Plus AI Elements for the UI components — saves 200 lines of styling
npx ai-elements@latest add conversation message

If you're not using Next.js, the SDK works with any React framework that supports SSE — Remix, TanStack Start, Astro with islands. The patterns transfer.

Step 2 — the server route

Create app/api/chat/route.ts:

import { streamText, type ModelMessage } from "ai"

export const maxDuration = 60 // streaming can take a while

export async function POST(req: Request) {
  const { messages }: { messages: ModelMessage[] } = await req.json()

  const result = streamText({
    // "provider/model" strings route through AI Gateway by default.
    // No SDK install per provider. Swap the model id to swap providers.
    model: "anthropic/claude-sonnet-4-6",
    system:
      "You're a helpful assistant. Reply in plain text. " +
      "If you don't know, say so plainly. No filler.",
    messages,
    // Capture token usage for the cost ledger.
    onFinish({ text, usage }) {
      // fire-and-forget — don't block the stream on persistence
      void persistMessage({ text, usage })
    },
  })

  return result.toDataStreamResponse()
}

async function persistMessage(_args: { text: string; usage: unknown }) {
  // your DB write here
}

A few details that matter:

The model field accepts a "provider/model" string. Provider/model strings are how AI Gateway exposes its catalog without naming collisions — openai/gpt-5.4-mini, anthropic/claude-sonnet-4-6, google/gemini-2.5-pro. The gateway looks up the credentials and routes the request. Your server never sees provider keys.

The onFinish callback fires after the stream completes. It runs after the response has already been streamed to the client, so persistence latency doesn't show up in the user's perceived response time. This is the right place to write to your database, log to your observability stack, and update token-count ledgers.

maxDuration = 60 matters on Vercel — the default function timeout is shorter than some long generations need. Set it explicitly. (You can also enable Fluid Compute for streaming endpoints; it reuses warm instances and dodges cold starts.)

toDataStreamResponse() returns a Response whose body is the AI SDK's typed stream — JSON-encoded events the client SDK knows how to parse. Don't try to write your own SSE format unless you're forking the SDK; the typed stream supports tool calls, errors, and metadata that plain text doesn't.

Step 3 — the client

Create app/chat/page.tsx:

"use client"

import { useChat } from "@ai-sdk/react"
import { Conversation, ConversationContent, ConversationScrollButton } from "@/components/ai-elements/conversation"
import { Message, MessageContent } from "@/components/ai-elements/message"

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit, status, error, stop } =
    useChat({
      api: "/api/chat",
      // Reload the last 50 messages on mount — server returns from DB
      initialMessages: undefined,
    })

  return (
    <main className="mx-auto flex h-svh max-w-2xl flex-col">
      <Conversation>
        <ConversationContent>
          {messages.map((m) => (
            <Message key={m.id} from={m.role}>
              <MessageContent>{m.content}</MessageContent>
            </Message>
          ))}
          {error && (
            <p className="px-4 py-2 text-sm text-red-600">
              {error.message}. <button onClick={() => stop()}>Cancel</button>
            </p>
          )}
        </ConversationContent>
        <ConversationScrollButton />
      </Conversation>

      <form onSubmit={handleSubmit} className="border-t p-4">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything"
          className="w-full rounded border px-3 py-2"
          disabled={status === "streaming"}
        />
      </form>
    </main>
  )
}

The useChat hook handles SSE parsing, optimistic updates, and reconnect. You don't write any of that.

status is one of "submitted" | "streaming" | "ready" | "error". Use it to disable the input mid-stream and show a thinking indicator. The stop() callback aborts the in-flight request — bind it to a button or to escape-key.

The Vercel-published AI Elements components (Conversation, Message, etc.) are shadcn-derived, so they live in your repo and you customize them like any shadcn component. Worth using over rolling your own — the streaming-specific edge cases (auto-scroll, partial tokens, code-block rendering during streaming) are nontrivial to get right.

Step 4 — message persistence (so reload doesn't wipe history)

Two paths. Pick one based on whether your messages need to survive across devices.

Local-only (sessionStorage / IndexedDB): useChat exposes setMessages, so you can hydrate from local storage on mount. Fine for a personal assistant; bad for "log in on a different device and see the history."

Server-persisted: The pattern that works:

// In your /api/chat route, after streamText:
return result.toDataStreamResponse({
  async onFinish({ messages }) {
    await db.insert(chatMessages).values(messages)
  },
})

// In a separate /api/chat/[id]/route.ts:
export async function GET(req: Request, { params }: { params: { id: string } }) {
  const stored = await db.select().from(chatMessages).where(eq(chatMessages.threadId, params.id))
  return Response.json({ messages: stored })
}

Then on the client, fetch the history before mounting useChat and pass it as initialMessages. Don't try to load history inside useChat — the hook expects a stable initial state.

Step 5 — tool calls

A chatbot that can't take actions is a fancy autocomplete. Adding tool calls in v6:

import { streamText, tool } from "ai"
import { z } from "zod"

const result = streamText({
  model: "anthropic/claude-sonnet-4-6",
  messages,
  tools: {
    getWeather: tool({
      description: "Look up the current weather for a city.",
      inputSchema: z.object({
        city: z.string().describe("city name, e.g. 'Tokyo'"),
      }),
      execute: async ({ city }) => {
        const res = await fetch(`https://wttr.in/${encodeURIComponent(city)}?format=j1`)
        const data = await res.json()
        return { temp_c: data.current_condition[0].temp_C, desc: data.current_condition[0].weatherDesc[0].value }
      },
    }),
  },
  // Allow up to 5 tool-call cycles before giving up
  maxSteps: 5,
})

maxSteps is the cap on the agent loop. An agent loop is the cycle of: model decides to call a tool → server executes the tool → result is streamed back to the model → model decides again. Without maxSteps, a hallucinating model could call the same tool forever. We set 5 because anything more is usually a sign the prompt is wrong.

The client's useChat handles the tool-call rendering automatically — the Message component from AI Elements has a <ToolCall> slot that renders the tool name, arguments, and result inline. You don't write display code per tool.

A real tool-call gotcha: Claude prefers serial tool calls; GPT happily parallelizes. If your prompt depends on a specific call ordering, test on both. The OpenAI-compatible interface normalizes the shape but not the behavior. The migration post goes deeper on this.

Step 6 — error states that don't ruin the UX

Three error categories users actually hit:

Network errors (request fails, connection drops mid-stream). The error field from useChat populates. Show a retry button bound to reload().
Rate limits (429 from upstream). The provider returns a structured error. Surface it: "We're rate-limited right now. Retrying in 30s." Use a backoff in the route handler — Anthropic and OpenAI both accept retries cleanly. The OpenAI rate limit cookbook covers exponential backoff well.
Refusals (model declines to answer). Looks like a normal completion that says "I can't help with that." The handling is product-level: explain to the user what the policy is, suggest a rephrasing.

{error && (
  <div className="rounded border-l-2 border-amber-500 bg-amber-50 p-3 text-sm">
    {error.message.includes("rate") ? (
      <>Slowed down for a moment. Retrying automatically.</>
    ) : (
      <>Something went wrong. <button onClick={() => reload()}>Try again</button></>
    )}
  </div>
)}

Step 7 — cost and observability

If you don't track per-request cost, you'll find out about runaway prompts on the credit-card statement. The onFinish hook receives usage with promptTokens, completionTokens, and totalTokens. Multiply by the per-million-token rate from your gateway. Persist alongside the message so you can drill down later.

A minimal cost-tracking schema:

CREATE TABLE chat_messages (
  id           text primary key,
  thread_id    text not null,
  role         text not null check (role in ('user','assistant','tool')),
  content      jsonb not null,
  model        text,
  prompt_tokens integer,
  completion_tokens integer,
  cost_usd     numeric(10,6),
  created_at   timestamptz default now()
);

That gives you a cost-per-thread query in one line. Without it, you're guessing.

For deeper observability — full request traces, prompt versioning, A/B testing — layer something like Helicone or LangSmith on top. The gateway log shows you the request body and timing; those tools index it for search.

What this leaves out

This tutorial covers the chatbot. It doesn't cover:

Multi-modal (image/audio inputs). Add experimental_attachments to useChat and pass file parts in the message content. Same shape, more pieces.
Generative UI (streaming React components, not just text). Use streamUI instead of streamText, or the RSC GenUI template.
Memory across sessions. That's a retrieval problem; the chatbot is the front-end.
Authentication on the API route. Add it. Production routes without auth get billed by everyone on the internet within hours of going live.

How this fits with a gateway

We tested this exact pattern routed through three gateways: direct OpenAI, Vercel AI Gateway, and Synapse Garden. The code in this post is identical across all three — only the model string and the underlying credential change. That's the whole point of the SDK's provider abstraction. If you want the gateway-side context, the gateway comparison covers what each adds underneath; for governance specifically, the per-project keys post covers why the right credential matters.

When to use the AI SDK vs roll your own

Use the SDK when:

You want streaming, tool calls, and provider abstraction without writing 500 lines of SSE-parsing code.
Your app is React-based; the hooks are Next.js/React-specific.
You're OK pinning to v6 and following the upgrade path.

Skip the SDK when:

You're not on React (Vue/Svelte/etc. — the core streamText works server-side, but the UI hooks don't).
You need behavior the SDK doesn't expose (raw provider parameters, custom event types). Direct SDK calls give you more control.

For the 80% case the SDK is the right call. The amount of code it saves on the streaming + reconnect + tool-rendering side is meaningful.

Working repo

A complete reference implementation including persistence, tool calls, error states, and cost tracking lives at github.com/ayush-pinnacle/synapse-garden-chatbot-example. Clone, swap in your ANTHROPIC_API_KEY (or your Synapse Garden mg_live_* for the gateway path), and you have a working chatbot.

If you're routing through us specifically, set baseURL: "https://synapse.garden/api/v1" and use any model from the catalog. The AI SDK docs cover the full surface.

For the next step — adding agent loops, RAG, or model routing rules — see the function-calling post and the architecture notes in the docs.