zAI · AI infrastructure

Ship AI that
knows your data.

Name: zAI
Brand: Zyora Labs

One API for inference, embeddings and a schema-aware assistant. Stream answers, search by meaning and ground every response in your own data — without standing up a single GPU.

Inference APIEmbeddingsSchema-awareStreaming

zAI assistant

schema-aware

Which customers churned last quarter and why?

retrieving context…

Ask anything…

Streaming

No data leaves your project

120ms

Time to first token

API for every model

1536d

Embedding dimensions

GPUs to manage

An AI stack in one API

Everything you need to build with AI

Inference, embeddings, retrieval and guardrails — wired together and ready to ship, so you skip the glue code and the GPU bills.

Inference API

Call leading open and frontier models through one endpoint, with streaming, function calling and JSON mode built in.

Embeddings & vector search

Turn text, code and docs into vectors and search by meaning over your own data — storage and index included.

Schema-aware assistant

An assistant that reads your zMesh schema, writes safe queries and answers questions grounded in your live data.

RAG pipelines

Chunk, embed, retrieve and rerank in a few lines. Ground every answer with citations back to the source.

Tools & function calling

Let the model call your APIs and run actions safely, with typed arguments and automatic retries.

Guardrails & moderation

Filter unsafe content, redact PII and keep prompts and outputs inside the policies you define.

Search by meaning

Embeddings that map your whole world

zAI turns every document, row and message into a vector and finds the closest matches in milliseconds. No keyword guessing — just ask in plain language and get the most relevant context back.

Embed text, code, PDFs and images
Managed vector store — no infra to run
Rerank & filter for pinpoint retrieval

Vector search1536-d · cosine

Ask anything…⏎

1Refund & proration policybilling

0.00

2Cancel a subscriptionbilling

0.00

3Failed payment retriesbilling

0.00

4Update the card on fileaccount

0.00

4 matches · ranked by meaning

The inference pipeline

From question to grounded answer

Retrieval, reasoning and guardrails run in one call — so every response is fast, relevant and backed by your own data.

Prompt

received

User asks a question

Retrieve context

Top-k vectors from your data

Model

Reason over prompt + context

Stream tokens

Answer streams back live

Grounded answer

With citations & guardrails

Inference

One endpoint, every model

Swap between open and frontier models with a single line. Streaming, JSON mode and function calling work the same everywhere, so you can pick the best model per task without rewrites.

Token streaming with ~120ms first token
Function calling & structured JSON output
Automatic fallback & load balancing

chat.completions · stream120ms ttft

json modetoolsfallback

Embeddings

Retrieval that actually finds it

Embed your data once and search it by meaning forever. The managed vector store handles indexing, filtering and reranking, so retrieval stays accurate as your corpus grows.

Managed vector store, zero ops
Chunk, embed & rerank in a few lines
Metadata filters for scoped search

nearest matches · cosine

Refund policy for annual plans0.94

Cancellation & proration terms0.88

Enterprise SLA addendum0.71

Onboarding checklist0.42

Assistant

An assistant that reads your schema

Connect your zMesh project and the assistant understands your tables, relationships and policies. It writes safe queries, answers in plain language and always cites the rows it used.

Reads your live schema & relationships
Respects row-level security & policies
Answers with citations back to source

assistant · schema-aware

“Top 3 customers by revenue this month?”

SELECT name, mrr
FROM customers
ORDER BY mrr DESC LIMIT 3;

reads schema:customersinvoicesRLS enforced

Developer experience

A few lines from prompt to production

One typed SDK for inference, embeddings, search and the assistant. Stream tokens, ground answers and ship — without wiring up models, vector stores or GPUs yourself.

Typed SDKs for TypeScript & Python
Streaming, tools & JSON mode built in
Vector store & RAG with no extra infra

import { zai } from "@zyora/ai";

const res = await zai.chat.create({
  model: "zai-large",
  stream: true,
  messages: [{ role: "user", content: "Summarise Q1 churn" }],
});

for await (const chunk of res) process.stdout.write(chunk.text);

Why zAI

AI that feels built for your app

Fast by default

Streaming responses with low time-to-first-token keep your product feeling instant, even on big prompts.

Grounded in your data

Embeddings and the schema-aware assistant keep answers tied to your own content, with citations you can trust.

Private & governed

Your data stays in your project. Guardrails, PII redaction and RLS keep prompts and outputs inside policy.

Part of one platform

zAI plugs straight into zMesh, zAPI and the rest of Zyora Labs — auth, data and gateways already wired in.

zAI · AI infrastructure

Give your product an AI brain

Inference, embeddings and a schema-aware assistant in one API. Start free, no GPUs and no credit card required.

Ship AI thatknows your data.

Everything you need to build with AI

Inference API

Embeddings & vector search

Schema-aware assistant

RAG pipelines

Tools & function calling

Guardrails & moderation

Embeddings that map your whole world

From question to grounded answer

One endpoint, every model

Retrieval that actually finds it

An assistant that reads your schema

A few lines from prompt to production

AI that feels built for your app

Fast by default

Grounded in your data

Private & governed

Part of one platform

Give your product an AI brain

Ship AI that
knows your data.