Building AI-Powered Apps with the Vercel AI SDK

23. May, 2026 • 14 min read • Develop

A Unified Toolkit for AI in TypeScript

Every model provider ships its own SDK with its own quirks. OpenAI streams differently than Anthropic, Anthropic handles tools differently than Google, and the moment you want to swap providers you end up rewriting half your application. The Vercel AI SDK exists to make that pain go away — one API for every model, with first-class React hooks layered on top.

A year ago the AI SDK was mostly known for its useChat hook, the easy way to wire up a streaming chat UI in Next.js. Today it’s grown into a complete toolkit: a typed core for any TypeScript runtime, a UI layer that powers chat and assistant interfaces in any framework, an RSC layer for streaming React components from the server, and an agent loop for tool-using workflows. In this post we’ll walk through how the pieces fit together and the patterns that have become standard for shipping AI features in production Next.js apps.

What the SDK Actually Is

The Vercel AI SDK is split into three layers, each useful on its own:

AI SDK Core — A unified TypeScript API for calling LLMs. generateText, streamText, generateObject, streamObject, embed, and embedMany all work the same way regardless of provider.
AI SDK UI — Framework hooks (useChat, useCompletion, useObject, useAssistant) for React, Svelte, Vue, and Solid. Handles streaming, message state, optimistic updates, and error recovery.
AI SDK RSC — React Server Components helpers (streamUI, createStreamableUI, createStreamableValue) for streaming UI components from the server as the model generates them.

You install only what you need. A backend cron job doing structured extraction needs Core but not UI. A Pages Router chat widget needs Core and UI but not RSC. A Next.js App Router app with generative UI uses all three.

npm install ai @ai-sdk/openai @ai-sdk/anthropic zod

The provider packages are small adapters that translate the unified interface into provider-specific API calls. Adding a new provider means installing one more package, not rewriting your code.

The Core API

The lowest-level call is generateText — give it a model and a prompt, get text back:

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';

const { text, usage, finishReason } = await generateText({
  model: openai('gpt-5'),
  prompt: 'Explain quantum entanglement in two sentences.',
});

console.log(text);
console.log(`Used ${usage.totalTokens} tokens`);

Switching to Anthropic is a one-line change:

import { anthropic } from '@ai-sdk/anthropic';

const { text } = await generateText({
  model: anthropic('claude-opus-4-7'),
  prompt: 'Explain quantum entanglement in two sentences.',
});

The return shape — text, usage, finishReason, response, warnings — is identical. The same goes for system prompts, conversation history, temperature, max tokens, stop sequences, and the rest of the standard knobs.

Streaming

For chat interfaces you almost always want streamText instead of generateText. It returns a result object with multiple consumption styles:

import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-5'),
    system: 'You are a helpful assistant.',
    messages,
  });

  return result.toUIMessageStreamResponse();
}

The toUIMessageStreamResponse() helper returns a Response whose body is the AI SDK’s data stream protocol — a format the useChat hook on the client knows how to parse. You can also iterate the stream manually with result.textStream (just the text deltas) or result.fullStream (every event: text, tool calls, tool results, finish reasons).

Structured Output

generateObject is one of the most useful tools in the SDK. Give it a Zod schema and it returns a typed, validated object:

import { generateObject } from 'ai';
import { z } from 'zod';

const { object } = await generateObject({
  model: openai('gpt-5'),
  schema: z.object({
    recipe: z.object({
      name: z.string(),
      ingredients: z.array(z.object({
        name: z.string(),
        amount: z.string(),
      })),
      steps: z.array(z.string()),
      prepMinutes: z.number(),
    }),
  }),
  prompt: 'Generate a recipe for sourdough focaccia.',
});

object.recipe.ingredients.forEach((i) => {
  console.log(`${i.amount} ${i.name}`);
});

Under the hood the SDK uses each provider’s native structured output mode — OpenAI’s response format, Anthropic’s tool-call coercion, Google’s response schema — and validates the result against your Zod schema. Invalid responses throw a NoObjectGeneratedError you can catch and retry.

For long objects there’s streamObject, which streams partial objects as they’re generated. The matching useObject hook on the client gives you a typed, progressively filling object that’s perfect for forms that auto-populate or dashboards that build themselves.

Tool Calling

Tool calling — letting the model invoke functions in your code — is where AI applications start feeling agentic. The SDK normalises tools across providers so you write them once:

import { streamText, tool } from 'ai';
import { z } from 'zod';

const result = streamText({
  model: openai('gpt-5'),
  messages,
  tools: {
    getWeather: tool({
      description: 'Get the current weather for a city',
      inputSchema: z.object({
        city: z.string().describe('The city name'),
        units: z.enum(['celsius', 'fahrenheit']).default('celsius'),
      }),
      execute: async ({ city, units }) => {
        const data = await fetchWeather(city, units);
        return {
          temperature: data.temp,
          conditions: data.conditions,
          humidity: data.humidity,
        };
      },
    }),
    searchProducts: tool({
      description: 'Search the product catalog',
      inputSchema: z.object({
        query: z.string(),
        limit: z.number().default(10),
      }),
      execute: async ({ query, limit }) => {
        return await db.products.search(query, limit);
      },
    }),
  },
  stopWhen: stepCountIs(5),
});

The model sees tool descriptions and input schemas. When it decides to call one, the SDK invokes your execute function, captures the return value, and feeds it back as a tool result message. The stopWhen option caps how many tool-call rounds the model can perform before the loop ends — a critical safety valve. Without it a stuck model can chew through your budget calling the same tool forever.

For tools that need user confirmation (transferring money, deleting data, sending an email), omit execute and handle the call on the client instead. The model proposes the call, your UI shows a confirmation dialog, and you submit the result back through the chat hook. This pattern keeps destructive actions human-in-the-loop without giving up the conversational flow.

The useChat Hook

useChat is the AI SDK UI workhorse. It manages messages, handles streaming, surfaces tool calls, and gives you input state and submission handlers:

'use client';

import { useChat } from '@ai-sdk/react';

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit, status, stop } =
    useChat({ api: '/api/chat' });

  return (
    <div className="flex flex-col gap-4">
      {messages.map((m) => (
        <div key={m.id} className={m.role === 'user' ? 'text-right' : ''}>
          <strong>{m.role}: </strong>
          {m.parts.map((part, i) => {
            if (part.type === 'text') return <span key={i}>{part.text}</span>;
            if (part.type === 'tool-call') {
              return (
                <pre key={i}>
                  Calling {part.toolName}({JSON.stringify(part.args)})
                </pre>
              );
            }
            return null;
          })}
        </div>
      ))}

      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask something..."
          disabled={status !== 'ready'}
        />
        {status === 'streaming' && (
          <button type="button" onClick={stop}>
            Stop
          </button>
        )}
      </form>
    </div>
  );
}

The messages array uses a parts-based shape so a single assistant message can include text, tool calls, tool results, reasoning steps, and inline files. Rendering by part type is the recommended pattern — it scales cleanly as you add more capabilities.

The status field has four values: submitted, streaming, ready, error. Use it to disable the input while a request is in flight, show a typing indicator, and expose a stop button so users can interrupt long responses.

Need to persist conversations? Pass id and initialMessages from a server-loaded conversation. The hook handles the rest, including reconciling new server-side messages once the stream completes.

Generative UI with React Server Components

The most distinctive AI SDK feature is generative UI — letting the model stream actual React components from the server, not just text. streamUI is the entry point:

import { streamUI } from 'ai/rsc';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
import { WeatherCard } from '@/components/weather-card';
import { ProductList } from '@/components/product-list';

export async function ask(question: string) {
  'use server';

  const result = await streamUI({
    model: openai('gpt-5'),
    prompt: question,
    text: ({ content }) => <p>{content}</p>,
    tools: {
      showWeather: {
        description: 'Display a weather card for a city',
        inputSchema: z.object({ city: z.string() }),
        generate: async function* ({ city }) {
          yield <WeatherCard city={city} loading />;
          const data = await fetchWeather(city);
          return <WeatherCard city={city} data={data} />;
        },
      },
      showProducts: {
        description: 'Display matching products',
        inputSchema: z.object({ query: z.string() }),
        generate: async ({ query }) => {
          const products = await db.products.search(query);
          return <ProductList products={products} />;
        },
      },
    },
  });

  return result.value;
}

The model chooses which tool to call, and the generate function returns a React element (or yields multiple, for progressive states). The element is streamed to the client and rendered in place. You can yield a loading skeleton, fetch data, then yield the populated component — all from one async generator.

This sidesteps the usual “model returns JSON, client renders a card” dance entirely. The server owns the rendering, which means you can use server-only data sources, keep API keys out of the bundle, and ship richer UI without re-architecting your data layer.

Working with Embeddings

The SDK isn’t just for chat — it’s also the simplest way to generate embeddings for semantic search, retrieval, and clustering:

import { embedMany } from 'ai';
import { openai } from '@ai-sdk/openai';

const { embeddings, usage } = await embedMany({
  model: openai.embedding('text-embedding-3-large'),
  values: docs.map((d) => d.content),
});

await db.documents.insertMany(
  docs.map((d, i) => ({
    ...d,
    embedding: embeddings[i],
  }))
);

For a query, use embed for a single vector and run a similarity search against your vector store. The Drizzle and Prisma ecosystems both support pgvector columns now, so this slots cleanly into a normal Postgres-backed Next.js app without a separate vector database.

Provider Switching and the AI Gateway

Because every provider exposes the same interface, switching is a matter of changing the model passed to a call. In practice teams take this further with a small wrapper that selects a model based on environment or feature flag:

import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';

function pickModel(task: 'fast' | 'smart' | 'cheap') {
  switch (task) {
    case 'fast':
      return google('gemini-2.5-flash');
    case 'smart':
      return anthropic('claude-opus-4-7');
    case 'cheap':
      return openai('gpt-5-mini');
  }
}

const { text } = await generateText({
  model: pickModel('smart'),
  prompt,
});

Vercel’s hosted AI Gateway takes this one step further. It provides a single endpoint, a single API key, automatic failover between providers, request-level analytics, and unified billing. You point your AI SDK provider at the gateway and gain provider-agnostic observability without changing application code:

import { createGateway } from '@ai-sdk/gateway';

const gateway = createGateway({ apiKey: process.env.AI_GATEWAY_KEY });

const { text } = await generateText({
  model: gateway('anthropic/claude-opus-4-7'),
  prompt,
});

If the upstream provider is down, the gateway can route to a fallback model you configure. For production apps this turns model outages from a Sev 1 into a logged warning.

Agentic Loops

Multi-step agents — model calls a tool, sees the result, calls another tool, eventually answers — are first-class in the SDK. streamText with tools already loops automatically up to your stopWhen limit, but for richer control there’s the experimental_Agent class:

import { Agent } from 'ai';

const supportAgent = new Agent({
  model: anthropic('claude-opus-4-7'),
  system: 'You are a support engineer. Use the tools to investigate.',
  tools: {
    searchTickets,
    readLogs,
    queryDatabase,
    pageOncall,
  },
  stopWhen: stepCountIs(10),
});

const { text, steps } = await supportAgent.generate({
  prompt: 'Investigate why customer 4271 cannot log in.',
});

Each step in steps includes the model’s reasoning, the tool calls it made, and their results. Log these to your observability platform and you get an audit trail of every decision the agent took — invaluable when something goes wrong and you need to debug an autonomous workflow.

Middleware and Observability

Every model call passes through any middleware you register, which is where caching, logging, and guardrails live:

import { wrapLanguageModel, type LanguageModelV2Middleware } from 'ai';

const cacheMiddleware: LanguageModelV2Middleware = {
  wrapGenerate: async ({ doGenerate, params }) => {
    const key = hashParams(params);
    const cached = await redis.get(key);
    if (cached) return JSON.parse(cached);

    const result = await doGenerate();
    await redis.set(key, JSON.stringify(result), 'EX', 3600);
    return result;
  },
};

const model = wrapLanguageModel({
  model: openai('gpt-5'),
  middleware: [cacheMiddleware],
});

For telemetry, pass experimental_telemetry: { isEnabled: true } to any call and the SDK emits OpenTelemetry spans for the request, tool calls, and token usage. Pipe those into Langfuse, Helicone, or PostHog LLM Analytics and you have full observability of every model interaction without bespoke instrumentation.

Comparison with Raw Provider SDKs

When does the AI SDK earn its keep over calling provider SDKs directly?

Concern	Raw SDK	Vercel AI SDK
Provider switching	Rewrite call site	Change one import
Streaming protocol	Provider-specific SSE	Unified data stream
Tool calling	Different shape per provider	One `tool()` helper
Structured output	Manual schema handling	`generateObject` with Zod
React integration	Build it yourself	`useChat`, `useObject`
RSC streaming	Build it yourself	`streamUI`
Multi-step agents	Build the loop	`Agent` + `stopWhen`

The trade-off is a thin abstraction layer between you and the model. For 95% of applications that’s a win — the SDK exposes every provider feature that matters and adds capabilities the raw SDKs lack. The exceptions are research code that needs the absolute newest provider-specific feature within hours of release, and apps that only ever talk to one model and want zero dependencies.

Production Patterns

A few patterns that have become standard once you’re past the first prototype:

Always set stopWhen on tool-enabled calls. Stuck loops are the most common way to burn a budget.
Validate inputs with Zod, even ones the model “should” get right. Use .describe() on each field — the model reads descriptions and they noticeably improve tool-call accuracy.
Surface tool calls in the UI rather than hiding them. Users trust agents more when they can see what’s being looked up.
Persist messages server-side by hooking onFinish in your route handler to write the assistant response to your database. Don’t rely on the client to round-trip the full transcript.
Use experimental_telemetry from day one. Cost surprises are easier to diagnose when every call is traced.
Cache aggressively for prompts that don’t depend on user input. A middleware cache on a system-prompt summarisation call can cut your bill by an order of magnitude.

When Not to Use It

There are still cases where reaching for the AI SDK is overkill. A one-off script that calls OpenAI once and exits is fine with the raw openai package. Edge functions where every kilobyte counts can benefit from skipping the abstraction. And if you’re building a non-React Vue or vanilla TypeScript app, AI SDK UI is less compelling — though Core still wins on provider abstraction.

For anything beyond a single call from a single provider in a frontend codebase, the SDK pays for itself within an afternoon of work saved.

Conclusion

The Vercel AI SDK has quietly become the default way to ship AI features in a TypeScript app. It papers over the provider differences that make integration tedious, gives you a real React hook for chat instead of a DIY SSE parser, and unlocks patterns — generative UI, streamed structured output, agentic loops — that would be weeks of work to build from scratch.

Pair it with the AI Gateway for production routing, with PostHog LLM Analytics for observability, and with Zod for input validation, and you have a stack that handles every layer of an AI feature from the model call up to the rendered UI. The model landscape will keep churning — new providers, new modalities, new capabilities — but the SDK absorbs that churn so your application code doesn’t have to.

‘Till next time!