Skip to content

Agent Performance Report — 2026-02-24 #18189

@github-actions

Description

@github-actions

Performance Summary

  • Analysis period: 2026-02-17 → 2026-02-24 (7-day window, 27 runs sampled)
  • Agent quality score: 91/100 (↓ 1 from 92 — AI Moderator regression)
  • Agent effectiveness score: 87/100 (↓ 1 from 88)
  • Non-IM success rate: 95% (20/21) ↓ from 100% — 1 AI Moderator failure
  • Overall success rate: 78% (21/27) — 5 failures (4× Issue Monster, 1× AI Moderator)
  • Total tokens (7d): ~17.7M | Cost: ~$6.39 | Turns: 138
  • Critical agent issues: 1 new ⚠️ + 1 ongoing ❌

Critical Findings

❌ P1 Ongoing — Issue Monster (22nd+ consecutive failure period)

Issue Monster is failing 4/4 times in the current window with error_count: 1 per run (avg 1.8m fast-fail). Root cause remains: lockdown: true + missing GH_AW_GITHUB_TOKEN. Fix is in #17807 (remove lockdown: true) but has not been applied. Issue #17414 closed "not_planned". This is infrastructure noise, not agent quality degradation — but the ~50 failures/day represent significant CI resource waste.

Escalation recommended: #17807 fix should be merged to end this multi-week streak.

⚠️ New Regression — AI Moderator (1 failure + 3 missing-tool reports)

Yesterday AI Moderator scored 94/100 with 2/2 success. Today it shows 1 outright failure and 3 missing-tool reports in 6 runs (50% degraded). Root cause: github.mode: local (Docker) GitHub MCP server is intermittently unavailable in the CI runner environment.

  • Run 22361284967: failure (err_count: 1)
  • Runs 22361207226, 22359803227, 22358411348: succeeded despite missing GitHub MCP (conservative noop behavior) but reported missing_tool

The AI Moderator's conservative design (completing as success/noop when data is unavailable) masks the issue at the conclusion level. 3/6 runs had no GitHub content to analyze — meaning moderation was skipped silently.

View Full Agent Performance Rankings

Top Performing Agents 🏆

Rank Agent Engine Score Success Avg Duration Notes
1 The Great Escapi copilot 95/100 1/1 6.1m Security posture maintained, clean run
2 CI Failure Doctor copilot 93/100 3/3 7.4m Consistent, reactive, 3 runs
3 Daily Safe Outputs Conformance Checker claude 92/100 1/1 5.7m Good depth, efficient
4 Lockfile Statistics Analysis Agent claude 92/100 1/1 10.1m Complex analysis, appropriate time
5 Chroma Issue Indexer copilot 90/100 1/1 6.1m Clean indexing run
6 The Daily Repository Chronicle copilot 90/100 1/1 6.7m Consistent daily narrative
7 DeepReport - Intelligence Gathering Agent - 91/100 1/1 9.0m Good depth
8 Daily Team Evolution Insights claude 91/100 1/1 7.2m Good cadence
9 Semantic Function Refactoring claude 89/100 1/1 11.0m Complex task, created actionable issue
10 Slide Deck Maintainer copilot 88/100 1/1 5.5m Efficient
11 Contribution Check copilot 87/100 1/1 9.9m Within expected range
12 Daily Copilot PR Merged Report copilot 82/100 1/1 11.9m Slow — approaching monitor threshold
13 Daily Safe Output Tool Optimizer - 78/100 1/1 14.7m ⚠️ Slowest non-meta workflow

Agents Needing Improvement 📉

Agent Engine Score Success Rate Issue
AI Moderator codex 72/100 83% (5/6) GitHub MCP local Docker intermittent
Issue Monster copilot 0/100* 0% (0/4) P1 lockdown token — infrastructure

*Score reflects infrastructure failure, not agent quality

View Behavioral Pattern Analysis

Productive Patterns ✅

  • CI Failure Doctor reactive cadence: 3 runs in ~24h indicates active CI failures being correctly caught and investigated. High-value reactive agent.
  • AI Moderator conservative design: When GitHub MCP unavailable, agent correctly files noop rather than halting. Reduces blast radius of infrastructure issues.
  • Claude agents (conformance, lockfile, refactoring): Consistently higher-quality deep analysis outputs with appropriate turn counts (17-25 turns).
  • The Great Escapi: Security boundary maintained — 0 injections found. Clean 6.1m execution.

Problematic Patterns ⚠️

  • AI Moderator silent degradation: 3 runs completing as "success" while missing critical GitHub MCP tools. Moderation skipped silently for ~50% of triggers today. No visibility until this analysis.
  • Daily Safe Output Tool Optimizer at 14.7m: Outlier in run duration. All other daily agents run 5.5-11m. Should be investigated if trend continues (was 9.1m on 2026-02-21 based on memory data).
  • Issue Monster infrastructure waste: 4× 1.8m failure runs = 7.2m wasted CI time today alone (~50 failures/day × 1.8m = 1.5h CI hours/day).

Collaboration Patterns

  • Meta-orchestrator coordination working: Workflow Health Manager correctly flagged P1 lockdown (now 4 workflows). Agent Performance detects AI Moderator regression independently.
  • No agent conflicts detected in this window.
View Effectiveness & Coverage Metrics

Task Completion Rates (non-infrastructure failures)

  • High completion (>85%): CI Failure Doctor (100%), The Great Escapi (100%), Conformance Checker (100%), Lockfile Stats (100%), Chronicle (100%), Team Evolution (100%)
  • Medium completion (70-85%): AI Moderator (83%), Daily Copilot PR Report (100%* — but slow), Daily Safe Output Optimizer (100%* — but very slow)
  • Failing (infrastructure): Issue Monster (0%)

Run Duration Distribution

  • Fast (<7m): AI Moderator (7.6m avg), Slide Deck Maintainer (5.5m), Conformance Checker (5.7m), Great Escapi (6.1m), Chroma Indexer (6.1m)
  • Medium (7-10m): CI Failure Doctor (7.4m), Daily Chronicle (6.7m), Team Evolution (7.2m)
  • Slow (10-12m): Contribution Check (9.9m), Lockfile Stats (10.1m), Semantic Refactoring (11m), Copilot PR Report (11.9m)
  • Very Slow (>12m): Daily Safe Output Tool Optimizer (14.7m)

Coverage Gaps

  • No PR quality agents running today (PR Triage Agent: P1 lockdown failure)
  • No issue triage agents running (Issue Monster: P1 lockdown failure, Issue Arborist: not triggered)
  • Security monitoring: active (Great Escapi)
  • Code quality: active (Semantic Refactoring, Conformance Checker)

Recommendations

High Priority

  1. Apply [q] fix(workflows): remove explicit lockdown:true to stop recurring failures #17807 fix (Issue Monster / PR Triage / Daily Issues / Org Health lockdown removal)

    • 4 workflows failing, ~50+ failures/day, 1.5h+ CI waste/day
    • Fix is ready — merge unblocks all 4 workflows immediately
    • Status: Fix available, not applied (2+ weeks)
  2. Investigate AI Moderator GitHub MCP intermittency

    • 3/6 runs missing GitHub MCP tools (Docker local mode unreliable)
    • Consider adding mode: remote fallback or switching to remote mode
    • Add monitoring: if missing_tool rate >30% in rolling window, alert
    • Impact: ~50% of moderation runs currently doing nothing

Medium Priority

  1. Daily Safe Output Tool Optimizer at 14.7m — Monitor for 3 runs; if consistently >12m, investigate for optimization or timeout issues

  2. Add AI Moderator missing-tool alerting — Current silent success when GitHub MCP unavailable creates false confidence in moderation coverage

Trends (vs 2026-02-23 report)

Metric 2026-02-23 2026-02-24 Change
Agent Quality 92/100 91/100 ↓ 1
Agent Effectiveness 88/100 87/100 ↓ 1
Non-IM Success Rate 100% (18/18) 95% (20/21) ↓ 5%
Critical Issues 0 1 new ⚠️ ↑ 1
AI Moderator Score 94/100 72/100 ↓ 22
CI Failure Doctor runs 1 3 ↑ 2 (more CI activity)
Total Cost (window) ~$6.85 (12h) ~$6.39 (7d sample) (different windows)

Actions Taken This Run

  • Detected AI Moderator GitHub MCP regression (new finding, not in previous report)
  • Confirmed Issue Monster P1 ongoing (22nd+ period)
  • Created this performance discussion
  • Updated agent-performance-latest.md shared memory

Analysis period: 2026-02-17 → 2026-02-24 | Next report: 2026-02-25
References: §22362703459 | AI Moderator failure | #17807 fix


Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.

Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.

Generated by Agent Performance Analyzer - Meta-Orchestrator

  • expires on Feb 25, 2026, 5:47 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions