-
Notifications
You must be signed in to change notification settings - Fork 251
Description
Performance Summary
- Analysis period: 2026-02-17 → 2026-02-24 (7-day window, 27 runs sampled)
- Agent quality score: 91/100 (↓ 1 from 92 — AI Moderator regression)
- Agent effectiveness score: 87/100 (↓ 1 from 88)
- Non-IM success rate: 95% (20/21) ↓ from 100% — 1 AI Moderator failure
- Overall success rate: 78% (21/27) — 5 failures (4× Issue Monster, 1× AI Moderator)
- Total tokens (7d): ~17.7M | Cost: ~$6.39 | Turns: 138
- Critical agent issues: 1 new
⚠️ + 1 ongoing ❌
Critical Findings
❌ P1 Ongoing — Issue Monster (22nd+ consecutive failure period)
Issue Monster is failing 4/4 times in the current window with error_count: 1 per run (avg 1.8m fast-fail). Root cause remains: lockdown: true + missing GH_AW_GITHUB_TOKEN. Fix is in #17807 (remove lockdown: true) but has not been applied. Issue #17414 closed "not_planned". This is infrastructure noise, not agent quality degradation — but the ~50 failures/day represent significant CI resource waste.
Escalation recommended: #17807 fix should be merged to end this multi-week streak.
⚠️ New Regression — AI Moderator (1 failure + 3 missing-tool reports)
Yesterday AI Moderator scored 94/100 with 2/2 success. Today it shows 1 outright failure and 3 missing-tool reports in 6 runs (50% degraded). Root cause: github.mode: local (Docker) GitHub MCP server is intermittently unavailable in the CI runner environment.
- Run 22361284967: failure (err_count: 1)
- Runs 22361207226, 22359803227, 22358411348: succeeded despite missing GitHub MCP (conservative noop behavior) but reported missing_tool
The AI Moderator's conservative design (completing as success/noop when data is unavailable) masks the issue at the conclusion level. 3/6 runs had no GitHub content to analyze — meaning moderation was skipped silently.
View Full Agent Performance Rankings
Top Performing Agents 🏆
| Rank | Agent | Engine | Score | Success | Avg Duration | Notes |
|---|---|---|---|---|---|---|
| 1 | The Great Escapi | copilot | 95/100 | 1/1 | 6.1m | Security posture maintained, clean run |
| 2 | CI Failure Doctor | copilot | 93/100 | 3/3 | 7.4m | Consistent, reactive, 3 runs |
| 3 | Daily Safe Outputs Conformance Checker | claude | 92/100 | 1/1 | 5.7m | Good depth, efficient |
| 4 | Lockfile Statistics Analysis Agent | claude | 92/100 | 1/1 | 10.1m | Complex analysis, appropriate time |
| 5 | Chroma Issue Indexer | copilot | 90/100 | 1/1 | 6.1m | Clean indexing run |
| 6 | The Daily Repository Chronicle | copilot | 90/100 | 1/1 | 6.7m | Consistent daily narrative |
| 7 | DeepReport - Intelligence Gathering Agent | - | 91/100 | 1/1 | 9.0m | Good depth |
| 8 | Daily Team Evolution Insights | claude | 91/100 | 1/1 | 7.2m | Good cadence |
| 9 | Semantic Function Refactoring | claude | 89/100 | 1/1 | 11.0m | Complex task, created actionable issue |
| 10 | Slide Deck Maintainer | copilot | 88/100 | 1/1 | 5.5m | Efficient |
| 11 | Contribution Check | copilot | 87/100 | 1/1 | 9.9m | Within expected range |
| 12 | Daily Copilot PR Merged Report | copilot | 82/100 | 1/1 | 11.9m | Slow — approaching monitor threshold |
| 13 | Daily Safe Output Tool Optimizer | - | 78/100 | 1/1 | 14.7m |
Agents Needing Improvement 📉
| Agent | Engine | Score | Success Rate | Issue |
|---|---|---|---|---|
| AI Moderator | codex | 72/100 | 83% (5/6) | GitHub MCP local Docker intermittent |
| Issue Monster | copilot | 0/100* | 0% (0/4) | P1 lockdown token — infrastructure |
*Score reflects infrastructure failure, not agent quality
View Behavioral Pattern Analysis
Productive Patterns ✅
- CI Failure Doctor reactive cadence: 3 runs in ~24h indicates active CI failures being correctly caught and investigated. High-value reactive agent.
- AI Moderator conservative design: When GitHub MCP unavailable, agent correctly files noop rather than halting. Reduces blast radius of infrastructure issues.
- Claude agents (conformance, lockfile, refactoring): Consistently higher-quality deep analysis outputs with appropriate turn counts (17-25 turns).
- The Great Escapi: Security boundary maintained — 0 injections found. Clean 6.1m execution.
Problematic Patterns ⚠️
- AI Moderator silent degradation: 3 runs completing as "success" while missing critical GitHub MCP tools. Moderation skipped silently for ~50% of triggers today. No visibility until this analysis.
- Daily Safe Output Tool Optimizer at 14.7m: Outlier in run duration. All other daily agents run 5.5-11m. Should be investigated if trend continues (was 9.1m on 2026-02-21 based on memory data).
- Issue Monster infrastructure waste: 4× 1.8m failure runs = 7.2m wasted CI time today alone (~50 failures/day × 1.8m = 1.5h CI hours/day).
Collaboration Patterns
- Meta-orchestrator coordination working: Workflow Health Manager correctly flagged P1 lockdown (now 4 workflows). Agent Performance detects AI Moderator regression independently.
- No agent conflicts detected in this window.
View Effectiveness & Coverage Metrics
Task Completion Rates (non-infrastructure failures)
- High completion (>85%): CI Failure Doctor (100%), The Great Escapi (100%), Conformance Checker (100%), Lockfile Stats (100%), Chronicle (100%), Team Evolution (100%)
- Medium completion (70-85%): AI Moderator (83%), Daily Copilot PR Report (100%* — but slow), Daily Safe Output Optimizer (100%* — but very slow)
- Failing (infrastructure): Issue Monster (0%)
Run Duration Distribution
- Fast (<7m): AI Moderator (7.6m avg), Slide Deck Maintainer (5.5m), Conformance Checker (5.7m), Great Escapi (6.1m), Chroma Indexer (6.1m)
- Medium (7-10m): CI Failure Doctor (7.4m), Daily Chronicle (6.7m), Team Evolution (7.2m)
- Slow (10-12m): Contribution Check (9.9m), Lockfile Stats (10.1m), Semantic Refactoring (11m), Copilot PR Report (11.9m)
- Very Slow (>12m): Daily Safe Output Tool Optimizer (14.7m)
Coverage Gaps
- No PR quality agents running today (PR Triage Agent: P1 lockdown failure)
- No issue triage agents running (Issue Monster: P1 lockdown failure, Issue Arborist: not triggered)
- Security monitoring: active (Great Escapi)
- Code quality: active (Semantic Refactoring, Conformance Checker)
Recommendations
High Priority
-
Apply [q] fix(workflows): remove explicit lockdown:true to stop recurring failures #17807 fix (Issue Monster / PR Triage / Daily Issues / Org Health lockdown removal)
- 4 workflows failing, ~50+ failures/day, 1.5h+ CI waste/day
- Fix is ready — merge unblocks all 4 workflows immediately
- Status: Fix available, not applied (2+ weeks)
-
Investigate AI Moderator GitHub MCP intermittency
- 3/6 runs missing GitHub MCP tools (Docker
localmode unreliable) - Consider adding
mode: remotefallback or switching toremotemode - Add monitoring: if missing_tool rate >30% in rolling window, alert
- Impact: ~50% of moderation runs currently doing nothing
- 3/6 runs missing GitHub MCP tools (Docker
Medium Priority
-
Daily Safe Output Tool Optimizer at 14.7m — Monitor for 3 runs; if consistently >12m, investigate for optimization or timeout issues
-
Add AI Moderator missing-tool alerting — Current silent success when GitHub MCP unavailable creates false confidence in moderation coverage
Trends (vs 2026-02-23 report)
| Metric | 2026-02-23 | 2026-02-24 | Change |
|---|---|---|---|
| Agent Quality | 92/100 | 91/100 | ↓ 1 |
| Agent Effectiveness | 88/100 | 87/100 | ↓ 1 |
| Non-IM Success Rate | 100% (18/18) | 95% (20/21) | ↓ 5% |
| Critical Issues | 0 | 1 new |
↑ 1 |
| AI Moderator Score | 94/100 | 72/100 | ↓ 22 |
| CI Failure Doctor runs | 1 | 3 | ↑ 2 (more CI activity) |
| Total Cost (window) | ~$6.85 (12h) | ~$6.39 (7d sample) | (different windows) |
Actions Taken This Run
- Detected AI Moderator GitHub MCP regression (new finding, not in previous report)
- Confirmed Issue Monster P1 ongoing (22nd+ period)
- Created this performance discussion
- Updated
agent-performance-latest.mdshared memory
Analysis period: 2026-02-17 → 2026-02-24 | Next report: 2026-02-25
References: §22362703459 | AI Moderator failure | #17807 fix
Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.
Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.
Generated by Agent Performance Analyzer - Meta-Orchestrator
- expires on Feb 25, 2026, 5:47 PM UTC