Cost Controls

The gateway tracks token usage and estimated cost for every AI request. Cost controls let you set per-model rates, define monthly budget caps, and view usage breakdowns by model and user — giving you visibility into AI spend before the bill arrives.

Per-Model Cost Rates

Each model has configurable input and output token rates that are used to estimate cost:

Model Input (per 1M tokens) Output (per 1M tokens)
Claude Opus $15.00 $75.00
Claude Sonnet $3.00 $15.00
Claude Haiku $0.25 $1.25
GPT-4o $2.50 $10.00
GPT-4o Mini $0.15 $0.60
Gemini Pro $1.25 $5.00
Gemini Flash $0.075 $0.30
Ollama (self-hosted) $0.00 $0.00

Note: These rates are configurable in the Portal. Update them to match your organization's negotiated pricing or to account for infrastructure costs of self-hosted models.

Monthly Budget Caps

Set a monthly budget cap per tenant to prevent overspending. When usage approaches or exceeds the budget:

  • 80% threshold: A warning is displayed in the Portal dashboard
  • 100% threshold: Depending on configuration, requests may be downgraded to cheaper models or blocked

Budget caps are configured in the Portal under Settings > Billing.

Setting a Budget

  1. Open the Portal
  2. Navigate to Settings
  3. Enter the monthly budget amount in the Monthly Budget Cap field
  4. Click Save

Usage Breakdown

The cost dashboard provides multiple views of your AI spending:

By Model

See how much each model contributes to your total spend. This helps you identify opportunities to use more cost-effective models for routine tasks.

By User

Track which team members are consuming the most tokens. Useful for identifying power users who might benefit from training on more efficient prompting techniques.

By Time Period

View usage trends over time — daily, weekly, or monthly. Identify spikes, track growth, and forecast future spend.

Billing Period Tracking

Usage resets at the start of each billing period (typically the first of the month). The Portal dashboard shows:

  • Current period spend: Total estimated cost so far this billing period
  • Budget remaining: How much of the monthly cap is left
  • Projected spend: Estimated total for the current period based on usage trends
  • Total requests: Number of AI requests this period
  • Total tokens: Aggregate input and output tokens consumed

Cost Optimization Tips

  • Use the Standard tier for routine tasks — Haiku-class models are 10-60x cheaper than Opus-class
  • Set max token limits to prevent unexpectedly long responses
  • Route engine workflows through cost-effective models when possible
  • Use Ollama for self-hosted models with zero per-token cost
  • Review the By User breakdown to identify training opportunities

Tip: Configure your standard tier with a fast, affordable model like Claude Haiku or GPT-4o Mini. Most everyday questions don't need frontier models, and routing them to standard saves significant cost.

Next Steps