OpenRouter vs Portkey vs Helicone vs LangSmith vs Synapse Garden
An LLM gateway comparison. What each tool actually does, where they overlap, and how to pick when your team can only commit to one.
- comparison
- tooling
- observability
- gateway
The category called LLM gateway is messier than the name suggests. Some products are routers (pick a model based on rules, send the request). Some are observability layers (log everything, give you traces). Some are governance layers (keys, budgets, rate limits). Some are all three. The marketing pages all use the same vocabulary, which makes it hard to compare them.
This post does the comparison. Five products: OpenRouter, Portkey, Helicone, LangSmith, and Synapse Garden. In our experience running real workloads through each of them at production scale, the marketing claims tell you what each tool is capable of; only running traffic through them tells you what each tool is for. The takes below are observational, not advertised.
The shape of the category
Before the per-product breakdown, here's the rough split. Most products in this space do two or three of these jobs:
- Routing. Take a request, decide which model handles it, send it. Includes load balancing, fallbacks, and per-request model overrides.
- Authentication and governance. Per-key budgets, scoped credentials, rate limits, audit logs. Treats LLM access as production infrastructure, not a developer credential.
- Observability. Logs every request and response. Lets you search, replay, build evals from production traffic.
- Cost optimization. Caching, prompt-deduplication, smaller-model fallbacks for cheap requests.
- Eval tooling. Build test sets from real traces. Run new prompts against historical inputs. Compare outputs across model versions.
A team's first question shouldn't be "which gateway" but "which two of these jobs do I need today?" The product fit follows from that.
OpenRouter
OpenRouter is a unified API surface for 200+ models from 30+ providers, with credit-based prepay billing.
What it does well. The catalog. If a model exists, OpenRouter probably has it within a day. Smaller open-source providers, fine-tunes, regional deployments — the breadth is unmatched. The OpenAI-compatible API works.
What it doesn't do. Per-project keys with budgets. Fine-grained governance. Audit logs deep enough for compliance. Their "BYOK" feature lets you bring your own provider keys, but the governance layer on top is thin. You can build it yourself; out of the box it isn't there.
Pricing. Markup is variable per model — typically 5-10% above provider list. They charge a small fee on the credit purchase too.
Pick OpenRouter if. You're a solo dev or a small team and you want to try ten different models tomorrow without ten different signup flows. The breadth is the whole pitch and it's a good pitch.
Don't pick OpenRouter if. You need defensible governance, per-team budgets, or compliance documentation. You'll end up building those layers on top, at which point you've recreated a different product.
Portkey
Portkey is a gateway focused on production reliability — fallbacks, retries, load balancing across providers, plus a virtual-key system for governance.
What it does well. Routing logic. Their config language for fallbacks — "if model X fails or is slow, fall back to model Y" — is the most expressive in the category. Their virtual keys are real per-project credentials with budgets (see our take on per-project keys for why this matters). The dashboard for tracing requests across fallbacks is clean.
What it doesn't do. As deep an observability story as Helicone or LangSmith. Their analytics are competent but not their primary product. They lean enterprise; pricing reflects that for larger teams.
Pricing. Free tier with caps; paid tiers start at low double digits per month and go up based on volume and features (audit log retention, SSO, etc.).
Pick Portkey if. Your reliability story is the priority — high QPS, can't tolerate outages from a single provider, want explicit failover rules. The config language is worth learning.
Don't pick Portkey if. You want one tool that does everything. They've made tradeoffs to be the best at routing and governance; deep eval tooling isn't there.
Helicone
Helicone is primarily an observability tool. Every request logs. You search, filter, build prompt templates, version, A/B test.
What it does well. Logging. Their UI for browsing and filtering thousands of requests is genuinely good. The prompt-versioning workflow — store the prompt as a template, version it, see which version produced which output — solves a real problem most teams kludge with git diffs.
What it doesn't do. It's not primarily a router. It will route, but the model selection logic is thinner than Portkey's. Governance is present but oriented toward reading the logs, not enforcing budgets.
Pricing. Free tier with significant log volume; paid tiers based on volume and retention.
Pick Helicone if. You're past the prototyping phase and you've realized you need to look at production traffic before changing prompts. The observability is the differentiator.
Don't pick Helicone if. You need budgets to actually stop spend, or you need primary routing logic. You'll layer those on top.
LangSmith
LangSmith is LangChain's observability and eval platform. Strongest when used with LangChain, but also works standalone.
What it does well. Evals. The evals story — building test sets from production traces, running them against new prompts or model versions, comparing — is more mature here than anywhere else in the category. If you have a model-quality team or are doing serious prompt engineering, this is the tool.
What it doesn't do. Routing or governance. It's not a gateway in the proxy sense; it's a tracing and eval product. You set it up alongside whatever you use to call models, not in front of it.
Pricing. Free tier with caps; paid tiers usage-based.
Pick LangSmith if. You're shipping LLM features where output quality matters more than infrastructure costs, and you're spending real engineering time on prompt iteration. Or you're already on LangChain and want the integrated experience.
Don't pick LangSmith if. You need governance, budgets, or per-key isolation. It's not that kind of product.
Synapse Garden (this is us)
Synapse Garden is a proxy with strong per-project governance, OpenAI- and Anthropic-compatible APIs, transparent passthrough pricing.
What it does well. Per-project keys with hard budget caps that return 402 when hit. Atomic spend tracking — you can't accidentally overspend by a few seconds of inflight requests. Audit log on the dashboard. The proxy itself targets sub-50ms P95 overhead vs going direct, measured by k6 in CI. We charge a flat 10% over passthrough cost (the math lives on /legal/pricing-disclosure). There's no markup variance per model, no markup-on-cache-hits, no surprise fees.
What it doesn't do, today. We don't have the catalog breadth of OpenRouter — we cover the major model families through Vercel AI Gateway as our routing partner, which is ~100+ models, but if you want a specific one-off open-source fine-tune from a provider that's not on Gateway, we don't route to it. We don't have LangSmith's eval depth. We don't have Helicone's prompt versioning. We do logs (token counts, model, latency, status by default; full payloads opt-in) but not deep observability.
Pricing. Free tier with 1M tokens monthly. Paid tiers from $10/month with included tokens. Overage at list price plus the flat 10%. No card on free.
Pick Synapse Garden if. Per-project governance and predictable pricing are the priorities. You want to give different teams different keys, set hard budgets, and not worry about a runaway loop costing you four figures. The pricing is fully passthrough so you can audit it yourself.
Don't pick Synapse Garden if. You need a 200-model catalog with niche fine-tunes, or you need deep eval tooling baked in. Use OpenRouter for breadth, layer LangSmith on top for evals — that's a fine setup.
The honest comparison matrix
We tested each of these against the same 100-RPS workload on gpt-4o-mini and claude-sonnet-4-6 over a one-week period before writing this post. The grades below come from observed behavior on that workload, not from feature lists.
| Concern | OpenRouter | Portkey | Helicone | LangSmith | Synapse Garden |
|---|---|---|---|---|---|
| Catalog breadth | A+ | B+ | B+ | n/a (not a gateway) | B+ |
| Per-project keys + hard budgets | C | A | B+ | n/a | A |
| Routing rules / failover | B | A+ | B | n/a | B |
| Observability depth | C | B | A+ | A | B |
| Eval tooling | C | C | B+ | A+ | C |
| Pricing transparency | B | B | B+ | B | A |
| OpenAI-compatible API | A | A | A | n/a | A |
| Anthropic-compatible API | A | A | A | n/a | A |
| OSS / self-host option | n/a | yes (paid) | yes | partial | no |
These grades are observational, not benchmarks. Different teams will weight them differently. A team running highly regulated workloads cares about audit log retention and SSO; a solo dev cares about catalog breadth and free-tier ceiling.
How to choose
Skip the matrix and ask three questions:
-
What do you most need to fix this quarter? If it's "we don't know what features cost us," start with observability (Helicone). If it's "the bill is too unpredictable," start with governance (us, or Portkey). If it's "we need to ship faster across model variations," start with breadth (OpenRouter).
-
What's already in your stack? If you're on LangChain, LangSmith is the path of least resistance. If you're on Next.js with Vercel AI SDK, any of these works; we have the cleanest integration because we're built on top of the same gateway.
-
What's your appetite for vendor lock? All of these implement OpenAI-compatible endpoints, which means switching is "change the base URL." But the governance layer (keys, budgets, audit log) is product-specific. Plan for the gateway to be sticky in a way the model isn't.
You can layer them
Most production setups end up with two of these, not one. Common combinations:
- OpenRouter + LangSmith. Breadth from one, evals from the other. Governance is on you.
- Synapse Garden + LangSmith. Governance from us, evals from LangSmith. Same OpenAI-compatible base URL means LangSmith traces what we proxy.
- Portkey + Helicone. Routing from Portkey, observability from Helicone. Some redundancy in the dashboards.
The thing not to do is layer three. Each layer is a place where headers can be dropped, latency can be added, and rate limits can compound. Two is plenty.
What we'd actually pick today
If we were starting from zero with a small team, in May 2026:
- Just shipping? OpenRouter. Catalog and free tier are unbeatable for moving fast.
- Past prototype? Synapse Garden (us) for governance, layer Helicone on top if observability matters.
- Doing serious prompt iteration? LangSmith, paired with whichever gateway you prefer.
- Enterprise requirements (SOC2, SSO, audit retention)? Portkey or us, depending on which catalog and routing story matters more.
We're not the right answer for every team. The right answer is the one that solves the bottleneck you have now, not the one with the longest feature list.
If you want to dig deeper, the migration walkthrough covers the operational side of moving requests through any of these gateways. The latency benchmark covers the cost-per-request math you'll want to redo with whichever you pick.
Synapse Publication
Field notes, technical write-ups, and benchmarks from the team building Synapse Garden.
- Deep dive
Vercel AI Elements: 20+ React components for AI apps explained
A walk-through of every AI Elements component, what each one solves, and where rolling your own still wins. Practical patterns, real composition.
- How-to
Vercel AI SDK chatbot tutorial: useChat, streaming, real patterns
A working production-grade chatbot built on Vercel AI SDK v6. Streaming with useChat, tool calls, persistence, and the patterns that hold up after the demo.