zAI · AI infrastructure

Ship AI that
knows your data.

One API for inference, embeddings and a schema-aware assistant. Stream answers, search by meaning and ground every response in your own data — without standing up a single GPU.

Inference APIEmbeddingsSchema-awareStreaming
zAI assistant
schema-aware
Which customers churned last quarter and why?
retrieving context…
Ask anything…

120ms

Time to first token

1

API for every model

1536d

Embedding dimensions

0

GPUs to manage

An AI stack in one API

Everything you need to build with AI

Inference, embeddings, retrieval and guardrails — wired together and ready to ship, so you skip the glue code and the GPU bills.

Inference API

Call leading open and frontier models through one endpoint, with streaming, function calling and JSON mode built in.

Embeddings & vector search

Turn text, code and docs into vectors and search by meaning over your own data — storage and index included.

Schema-aware assistant

An assistant that reads your zMesh schema, writes safe queries and answers questions grounded in your live data.

RAG pipelines

Chunk, embed, retrieve and rerank in a few lines. Ground every answer with citations back to the source.

Tools & function calling

Let the model call your APIs and run actions safely, with typed arguments and automatic retries.

Guardrails & moderation

Filter unsafe content, redact PII and keep prompts and outputs inside the policies you define.

Search by meaning

Embeddings that map your whole world

zAI turns every document, row and message into a vector and finds the closest matches in milliseconds. No keyword guessing — just ask in plain language and get the most relevant context back.

  • Embed text, code, PDFs and images
  • Managed vector store — no infra to run
  • Rerank & filter for pinpoint retrieval
Vector search1536-d · cosine
Ask anything…
1Refund & proration policy
0.00
2Cancel a subscription
0.00
3Failed payment retries
0.00
4Update the card on file
0.00
4 matches · ranked by meaning
The inference pipeline

From question to grounded answer

Retrieval, reasoning and guardrails run in one call — so every response is fast, relevant and backed by your own data.

Prompt

received

User asks a question

Retrieve context

Top-k vectors from your data

Model

Reason over prompt + context

Stream tokens

Answer streams back live

Grounded answer

With citations & guardrails

Inference

One endpoint, every model

Swap between open and frontier models with a single line. Streaming, JSON mode and function calling work the same everywhere, so you can pick the best model per task without rewrites.

  • Token streaming with ~120ms first token
  • Function calling & structured JSON output
  • Automatic fallback & load balancing
chat.completions · stream120ms ttft
json modetoolsfallback
Embeddings

Retrieval that actually finds it

Embed your data once and search it by meaning forever. The managed vector store handles indexing, filtering and reranking, so retrieval stays accurate as your corpus grows.

  • Managed vector store, zero ops
  • Chunk, embed & rerank in a few lines
  • Metadata filters for scoped search
nearest matches · cosine
Refund policy for annual plans0.94
Cancellation & proration terms0.88
Enterprise SLA addendum0.71
Onboarding checklist0.42
Assistant

An assistant that reads your schema

Connect your zMesh project and the assistant understands your tables, relationships and policies. It writes safe queries, answers in plain language and always cites the rows it used.

  • Reads your live schema & relationships
  • Respects row-level security & policies
  • Answers with citations back to source
assistant · schema-aware
“Top 3 customers by revenue this month?”
SELECT name, mrr
FROM customers
ORDER BY mrr DESC LIMIT 3;
reads schema:customersinvoicesRLS enforced
Developer experience

A few lines from prompt to production

One typed SDK for inference, embeddings, search and the assistant. Stream tokens, ground answers and ship — without wiring up models, vector stores or GPUs yourself.

  • Typed SDKs for TypeScript & Python
  • Streaming, tools & JSON mode built in
  • Vector store & RAG with no extra infra
import { zai } from "@zyora/ai";

const res = await zai.chat.create({
  model: "zai-large",
  stream: true,
  messages: [{ role: "user", content: "Summarise Q1 churn" }],
});

for await (const chunk of res) process.stdout.write(chunk.text);
Why zAI

AI that feels built for your app

Fast by default

Streaming responses with low time-to-first-token keep your product feeling instant, even on big prompts.

Grounded in your data

Embeddings and the schema-aware assistant keep answers tied to your own content, with citations you can trust.

Private & governed

Your data stays in your project. Guardrails, PII redaction and RLS keep prompts and outputs inside policy.

Part of one platform

zAI plugs straight into zMesh, zAPI and the rest of Zyora Labs — auth, data and gateways already wired in.

zAI · AI infrastructure

Give your product an AI brain

Inference, embeddings and a schema-aware assistant in one API. Start free, no GPUs and no credit card required.