Ship AI that
knows your data.
One API for inference, embeddings and a schema-aware assistant. Stream answers, search by meaning and ground every response in your own data — without standing up a single GPU.
120ms
Time to first token
1
API for every model
1536d
Embedding dimensions
0
GPUs to manage
Everything you need to build with AI
Inference, embeddings, retrieval and guardrails — wired together and ready to ship, so you skip the glue code and the GPU bills.
Inference API
Call leading open and frontier models through one endpoint, with streaming, function calling and JSON mode built in.
Embeddings & vector search
Turn text, code and docs into vectors and search by meaning over your own data — storage and index included.
Schema-aware assistant
An assistant that reads your zMesh schema, writes safe queries and answers questions grounded in your live data.
RAG pipelines
Chunk, embed, retrieve and rerank in a few lines. Ground every answer with citations back to the source.
Tools & function calling
Let the model call your APIs and run actions safely, with typed arguments and automatic retries.
Guardrails & moderation
Filter unsafe content, redact PII and keep prompts and outputs inside the policies you define.
Embeddings that map your whole world
zAI turns every document, row and message into a vector and finds the closest matches in milliseconds. No keyword guessing — just ask in plain language and get the most relevant context back.
- Embed text, code, PDFs and images
- Managed vector store — no infra to run
- Rerank & filter for pinpoint retrieval
From question to grounded answer
Retrieval, reasoning and guardrails run in one call — so every response is fast, relevant and backed by your own data.
Prompt
receivedUser asks a question
Retrieve context
Top-k vectors from your data
Model
Reason over prompt + context
Stream tokens
Answer streams back live
Grounded answer
With citations & guardrails
One endpoint, every model
Swap between open and frontier models with a single line. Streaming, JSON mode and function calling work the same everywhere, so you can pick the best model per task without rewrites.
- Token streaming with ~120ms first token
- Function calling & structured JSON output
- Automatic fallback & load balancing
Retrieval that actually finds it
Embed your data once and search it by meaning forever. The managed vector store handles indexing, filtering and reranking, so retrieval stays accurate as your corpus grows.
- Managed vector store, zero ops
- Chunk, embed & rerank in a few lines
- Metadata filters for scoped search
An assistant that reads your schema
Connect your zMesh project and the assistant understands your tables, relationships and policies. It writes safe queries, answers in plain language and always cites the rows it used.
- Reads your live schema & relationships
- Respects row-level security & policies
- Answers with citations back to source
FROM customers
ORDER BY mrr DESC LIMIT 3;
A few lines from prompt to production
One typed SDK for inference, embeddings, search and the assistant. Stream tokens, ground answers and ship — without wiring up models, vector stores or GPUs yourself.
- Typed SDKs for TypeScript & Python
- Streaming, tools & JSON mode built in
- Vector store & RAG with no extra infra
import { zai } from "@zyora/ai";
const res = await zai.chat.create({
model: "zai-large",
stream: true,
messages: [{ role: "user", content: "Summarise Q1 churn" }],
});
for await (const chunk of res) process.stdout.write(chunk.text);AI that feels built for your app
Fast by default
Streaming responses with low time-to-first-token keep your product feeling instant, even on big prompts.
Grounded in your data
Embeddings and the schema-aware assistant keep answers tied to your own content, with citations you can trust.
Private & governed
Your data stays in your project. Guardrails, PII redaction and RLS keep prompts and outputs inside policy.
Part of one platform
zAI plugs straight into zMesh, zAPI and the rest of Zyora Labs — auth, data and gateways already wired in.