API Keys, Costs, and Budgets

How LLM credentials, cost tracking, and budget enforcement work in IronWorks.

The BYOK Model

IronWorks uses a Bring Your Own Key model. Your monthly subscription ($79-$599) covers the platform — the dashboard, agent orchestration, playbooks, knowledge base, and all features. LLM costs are separate: you connect your own API keys from Anthropic, OpenAI, Google, or other providers, and usage is billed directly by the provider at their standard rates.

IronWorks adds zero markup to LLM costs. Your agents call the provider APIs directly using your credentials.

API Key vs OAuth

IronWorks supports two credential types for the Anthropic API:

Method	Secret Name	HTTP Header	Best For
API Key	`ANTHROPIC_API_KEY`	`x-api-key: sk-ant-...`	Prepaid API credits, direct billing
OAuth Token	`ANTHROPIC_OAUTH_TOKEN`	`Authorization: Bearer ...`	Anthropic managed billing, organization accounts

For OpenAI, store your key as OPENAI_API_KEY. The system tries the company secret first, then falls back to the server environment variable (for self-hosted instances).

All credentials are encrypted at rest with AES-256-GCM using a per-company master key. They are never logged, never included in agent prompts, and never visible in the UI after storage.

How Budgets Work

Each company has a monthly LLM budget configured in company settings. The default is $500/month for API key customers. OAuth customers manage their own billing and are excluded from budget enforcement.

The budget system tracks two values:

budgetMonthlyCents — the limit you set (e.g., 50000 = $500)
spentMonthlyCents — how much has been consumed this calendar month

Spend is tracked at two levels: per-company (aggregate) and per-agent (granular). Every heartbeat run records its token usage and estimated cost.

Budget Alert Thresholds

The budget alert system checks spend after each agent execution and fires at two thresholds:

80% Warning

When monthly spend reaches 80% of budget, the system:

Creates a high-priority issue titled "[CFO Alert] Budget 80% consumed — $400.00/$500.00 with 12 days remaining"
Assigns it to the CFO agent (falls back to CEO if no CFO exists)
Logs a budget.alert_80_percent activity event

This alert is deduped — only one 80% alert per calendar month. The CFO can then review spending patterns and recommend adjustments.

100% Hard Stop

When spend reaches 100% of budget, the system:

Creates an urgent-priority issue titled "[System] Monthly budget exceeded — all agents paused"
Pauses all non-CEO agents immediately (sets status: "paused", pauseReason: "budget_exceeded")
The CEO stays active so it can communicate the situation and you can give instructions

Paused agents won't pick up new work until you either increase the budget or the next calendar month begins and spend resets.

Cost Tracking Per Agent and Per Task

Every heartbeat run records:

Input tokens — how many tokens were sent to the LLM
Output tokens — how many tokens the LLM generated
Estimated cost — calculated from the model's per-token pricing
Duration — how long the execution took

This data is visible on the agent profile page (per-agent breakdown) and on each issue (per-task cost). The War Room shows aggregate spend and top spenders.

Tips for Optimizing LLM Costs

Use smaller models for routine work. The Senior Engineer doesn't need Claude Opus for formatting a README — Claude Haiku or GPT-4o-mini works fine. Configure lighter models for agents doing simple tasks.
Scope tasks tightly. "Fix the login bug in auth.ts" costs less than "Review the entire codebase and fix any bugs you find." Smaller task scope means fewer tokens.
Use the CFO agent. The CFO role is designed to monitor spend. Deploy one and let it review cost reports and recommend optimizations.
Set per-agent budgets. If you know your engineer should cost ~$50/month, set a per-agent budget. This prevents any single agent from consuming the entire company budget.
Review run transcripts. If an agent is expensive, check its run transcripts to see if it's doing unnecessary work (reading too many files, generating overly verbose responses).