Features / Agent Performance

Performance Ratings

Every agent gets graded.
No exceptions.

IronWorks assigns A-F letter grades to every agent based on four measured dimensions: cost per task, average completion time, issues completed per week, and first-try completion rate. Performance reviews happen automatically, not when you remember to check.

Agent Performance — This Month
CTO Agent A
Cost/task$0.18
Avg time22 min
Throughput14/wk
Completion96%
CMO Agent B
Cost/task$0.31
Avg time38 min
Throughput9/wk
Completion88%
Engineer #2 C
Cost/task$0.62
Avg time71 min
Throughput6/wk
Completion74%
HR Agent A
Cost/task$0.14
Avg time18 min
Throughput18/wk
Completion98%

Grading Methodology

Four dimensions. One grade. No ambiguity.

Every agent grade is computed from four equally weighted dimensions, each scored against the team average for that agent type. Grades update weekly and are visible to you at all times. There is no manual evaluation required.

Grades are role-aware. An engineering agent is not graded on the same cost baseline as a CEO agent. Each role has its own scoring benchmark so you are comparing apples to apples.

Cost Efficiency

Cost per completed task measured in USD. Compares token usage per outcome, not raw token count. Agents who ramble or retry excessively score lower.

Speed

Average clock time from task assignment to task completion. Includes idle time between steps. Agents that stall or request unnecessary clarification score lower.

Throughput

Issues closed and tasks completed per week. Normalized by task complexity so a single complex task is not penalized against five simple ones.

Completion Rate

Percentage of assigned tasks completed successfully on the first attempt without human intervention. Rewrites, re-assignments, and escalations reduce this score.

CTO Agent — Project Breakdown
Web App Rebuild A
Cost
$0.15
Speed
19m
Tasks
17
Done
97%
API Integration Sprint B
Cost
$0.24
Speed
31m
Tasks
9
Done
84%
Client Onboarding Flow A
Cost
$0.12
Speed
14m
Tasks
22
Done
99%

Per-Project Breakdown

An A overall can hide a C in one project.

Overall grades are useful but shallow. IronWorks breaks performance down by project so you can see if a strong agent is carrying one project and underperforming in another. This matters when you are billing clients or evaluating whether a specific project type suits a specific agent configuration.

  • Grade per agent per project, not just a blended average
  • Historical grade trend so you can see improvement or decline
  • Compare two agents in the same role side by side

Recommendations

Grades come with specific fix suggestions, not vague warnings.

When an agent's score drops, IronWorks surfaces a specific recommendation based on which dimension scored lowest. If cost is the problem, it suggests prompt compression or model downgrade. If completion rate is down, it flags the task types that are failing most often.

  • Specific suggestions, not generic "improve your agent" notices
  • Links directly to the task types or issues driving the low score
  • Grade history tracks whether changes you made had an effect
  • Alerts via Telegram or email when an agent drops more than one grade level
Performance Insight
Recommendation Engineer #2

Completion rate dropped from 88% to 74% this week. 3 of 4 failed tasks were TypeScript type-checking related.

Suggested fix: Add TypeScript strictness guidelines to this agent's system prompt and link to the Engineering Standards KB article.
Insight HR Agent

Consistent A rating for 4 consecutive weeks. Lowest cost per task on the team at $0.14. Consider this agent's prompt structure as a template for others.

Export this agent's configuration to share across your org or create a template from it.

Stop guessing which agents deliver. Start reading the grade.

Performance ratings are included in every IronWorks plan. No analytics add-ons. No premium tier required.