AI Gateway

The Workjet AI Gateway is the single chokepoint through which every AI request flows. Whether a message comes from interactive chat, a background assistant, or a scheduled engine, it passes through the gateway before reaching any AI provider. This architecture gives IT and compliance teams complete visibility and control over how AI is used across the organization.

How Requests Flow

Every AI interaction follows the same path:

  1. Desktop App sends a chat completion request to the gateway
  2. Gateway authenticates the request (session cookie or API key)
  3. DLP pipeline scans the request for sensitive data (credit cards, SSNs, API keys, etc.)
  4. Rate limiter checks per-user request limits
  5. Model router selects the appropriate AI provider based on tier and priority rules
  6. AI Provider (Anthropic, OpenAI, Google, or Ollama) processes the request
  7. Gateway scans the response for sensitive data
  8. Audit logger records the full interaction (user, model, tokens, cost, DLP result)
  9. Response streams back to the desktop app
Desktop App                    AI Gateway                      AI Provider
+-----------+    request    +----------------+    routed    +-------------+
|  Chat /   | -----------> | Auth           | ----------> | Anthropic   |
| Assistant | <----------- | DLP Scan       | <---------- | OpenAI      |
|  Engine   |   response   | Rate Limit     |   response  | Google      |
+-----------+              | Route          |             | Ollama      |
                           | Audit Log      |             +-------------+
                           +----------------+

Why a Single Chokepoint?

Routing all AI traffic through one gateway provides several critical benefits:

  • Visibility: Every AI interaction is logged with user identity, model, tokens, cost, and DLP results
  • Cost control: Set monthly budgets and track spending by model and user
  • Data protection: Scan every request and response for sensitive data before it reaches external providers
  • Model governance: Control which models are available and how requests are routed
  • Rate limiting: Prevent abuse with per-user request limits
  • Compliance: Immutable audit trail for regulatory requirements

Architecture: The gateway runs on Cloudflare Workers at the edge, providing low-latency request processing. Authentication state is managed via Cloudflare KV, DLP policies are stored in D1, and audit logs are written to R2 for durable storage.

Gateway Features

Feature Description Learn More
Model Routing Tier-based routing with provider fallback and token limits Model Routing
DLP Pattern-based scanning with redact, block, and warn actions DLP Policies
Audit Logging Immutable record of every AI interaction Audit Logging
Cost Controls Per-model rates, monthly budgets, usage breakdown Cost Controls
Rate Limiting Per-user limits with configurable windows and thresholds Rate Limiting

Managed vs. Self-Hosted

Workjet offers two deployment options for the gateway:

  • Managed (default): The gateway runs at api.workjet.dev, fully managed by Workjet. No infrastructure to maintain.
  • Self-hosted (Enterprise): Deploy the gateway on your own Cloudflare account for complete data sovereignty. Available on the Enterprise plan.

Next Steps