Cost Controls

The gateway tracks token usage and estimated cost for every AI request. Cost controls let you set per-model rates, define monthly budget caps, and view usage breakdowns by model and user — giving you visibility into AI spend before the bill arrives.

Per-Model Cost Rates

Each model has configurable input and output token rates that are used to estimate cost:

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude Opus	$15.00	$75.00
Claude Sonnet	$3.00	$15.00
Claude Haiku	$0.25	$1.25
GPT-4o	$2.50	$10.00
GPT-4o Mini	$0.15	$0.60
Gemini Pro	$1.25	$5.00
Gemini Flash	$0.075	$0.30
Ollama (self-hosted)	$0.00	$0.00

Note: These rates are configurable in the Portal. Update them to match your organization's negotiated pricing or to account for infrastructure costs of self-hosted models.

Monthly Budget Caps

Set a monthly budget cap per tenant to prevent overspending. When usage approaches or exceeds the budget:

80% threshold: A warning is displayed in the Portal dashboard
100% threshold: Depending on configuration, requests may be downgraded to cheaper models or blocked

Budget caps are configured in the Portal under Settings > Billing.

Setting a Budget

Open the Portal
Navigate to Settings
Enter the monthly budget amount in the Monthly Budget Cap field
Click Save

Usage Breakdown

The cost dashboard provides multiple views of your AI spending:

By Model

See how much each model contributes to your total spend. This helps you identify opportunities to use more cost-effective models for routine tasks.

By User

Track which team members are consuming the most tokens. Useful for identifying power users who might benefit from training on more efficient prompting techniques.

By Time Period

View usage trends over time — daily, weekly, or monthly. Identify spikes, track growth, and forecast future spend.

Billing Period Tracking

Usage resets at the start of each billing period (typically the first of the month). The Portal dashboard shows:

Current period spend: Total estimated cost so far this billing period
Budget remaining: How much of the monthly cap is left
Projected spend: Estimated total for the current period based on usage trends
Total requests: Number of AI requests this period
Total tokens: Aggregate input and output tokens consumed

Cost Optimization Tips

Use the Standard tier for routine tasks — Haiku-class models are 10-60x cheaper than Opus-class
Set max token limits to prevent unexpectedly long responses
Route engine workflows through cost-effective models when possible
Use Ollama for self-hosted models with zero per-token cost
Review the By User breakdown to identify training opportunities

Tip: Configure your standard tier with a fast, affordable model like Claude Haiku or GPT-4o Mini. Most everyday questions don't need frontier models, and routing them to standard saves significant cost.

Next Steps

Configure model routing tiers for cost optimization
View usage analytics in the Portal
Set up audit logging for detailed usage tracking