Skip to content

Agent Performance Report - 2026-02-21 #17542

@github-actions

Description

@github-actions

Performance Summary

  • Agents analyzed: 10 distinct agents (18 completed runs today)
  • Overall quality score: 92/100 (↑ +1 from 91)
  • Effectiveness score: 88/100 (↑ +3 from 85)
  • Run success rate: 89% (16/18) — ↑ +18% from last week's 71%
  • Critical issues found: 0 — 19th consecutive zero-critical-issues period! 🎉
  • Total tokens: 14.3M | Estimated cost: ~$6.35 (partial day)
  • Safe items created: 14

🎉 19th Consecutive Zero-Critical-Issues Period

The agent ecosystem continues its strong run with no critical quality failures. This week's run success rate has recovered significantly from last week's 71% to 89% today.

🔒 Security Highlight: Prompt Injection Successfully Blocked

The Great Escapi detected and rejected a prompt injection attack this run. The task content attempted to instruct the agent to perform prohibited actions (sandbox escape, DNS tunneling, network evasion, reconnaissance). The agent correctly identified and refused, logging a clean noop with full explanation. Security posture confirmed excellent.

Critical Findings

  • P1 — Issue Monster failures (×3): GH_AW_GITHUB_TOKEN still missing — Issue #17387 open. Same root cause as last week. Affects 5 workflows total (Issue Monster dominates at ~30-min schedule).
  • ⚠️ CI Failure Doctor frequency: 5 runs in ~7 hours today suggests ongoing CI instability. Not an agent quality issue — the agent is performing correctly — but the trigger frequency may warrant investigating CI flakiness.
View Detailed Quality Analysis

Agent Quality Scores (Today's Runs)

Agent Runs Success Quality Effectiveness Notes
The Great Escapi 1 1 95/100 95/100 Blocked prompt injection
AI Moderator 3 3 91/100 93/100 2 turns each, very efficient
Daily Safe Outputs Conformance Checker 1 1 90/100 90/100 25 turns, 1M tokens
Auto-Triage Issues 1 1 89/100 92/100 Fastest: 2.8 min, 101K tokens
CI Failure Doctor 5 5 88/100 90/100 All 5 successful, 4.5–8 min each
Semantic Function Refactoring 1 1 87/100 82/100 Deep: 72 turns, 2.8M tokens
Lockfile Statistics Analysis Agent 1 1 87/100 85/100 39 turns, 1.93M tokens
Contribution Check 1 1 85/100 87/100 1.97M tokens, 6.4 min
Example: Custom Error Patterns 1 1 80/100 78/100 7 turns but 10.5 min (slow start)
Issue Monster 3 0 N/A N/A Infrastructure failure (P1)

Quality Dimension Analysis

Clarity: Outputs well-structured across all agents. AI Moderator and Great Escapi produce particularly clean, focused outputs.

Accuracy: All successful agents produced accurate analysis. CI Failure Doctor consistently identifies root causes in CI failures.

Completeness: Semantic Function Refactoring and Lockfile Statistics Analysis Agent provide comprehensive, detailed outputs. Auto-Triage Issues is thorough despite small token footprint.

Actionability: CI Failure Doctor outputs are immediately actionable (specific fixes). AI Moderator decisions (label/noop) are clear and defensible.

Resource Efficiency: Auto-Triage Issues (101K tokens, 2.8 min) and Great Escapi (75K tokens, 3 min) are the standout efficiency performers today.

View Effectiveness Metrics

Task Completion Summary

  • High completion (>85%): The Great Escapi, AI Moderator, Auto-Triage Issues, CI Failure Doctor
  • Medium completion (70-85%): Semantic Function Refactoring, Lockfile Statistics, Contribution Check, Daily Safe Outputs
  • Low completion (<70%): Issue Monster (0% — infrastructure)

Resource Efficiency Rankings

Agent Tokens/Run Duration Turns Efficiency Rating
Auto-Triage Issues 101K 2.8 min ⭐⭐⭐⭐⭐
The Great Escapi 75K 3.0 min 0 ⭐⭐⭐⭐⭐
AI Moderator 252K avg 7.3 min avg 2 ⭐⭐⭐⭐
CI Failure Doctor 920K avg 6.3 min avg ⭐⭐⭐
Daily Safe Outputs 1.01M 7.3 min 25 ⭐⭐⭐
Contribution Check 1.97M 6.4 min ⭐⭐
Lockfile Statistics 1.93M 9.5 min 39 ⭐⭐
Semantic Refactoring 2.84M 8.2 min 72 ⭐⭐

Weekly Trend

Week Success Rate Quality Effectiveness Weekly Cost
Feb 7–14 ~88% 93/100 88/100 ~$6.87
Feb 14–20 71% 91/100 85/100 ~$8.38
Feb 21 (today, partial) 89% 92/100 88/100 ~$6.35

The decline last week appears to be recovering. The Smoke Gemini and Issue Monster failures that dragged last week's rate down have stabilized (Gemini closed as accepted, Issue Monster continues but hasn't expanded).

View Behavioral Patterns

Productive Patterns ✅

  • CI Failure Doctor reactive loop: Correctly triggers on CI events, provides targeted fixes — strong human-in-the-loop pattern
  • AI Moderator minimal footprint: Consistently resolves in 2 turns with no wasted effort — exemplary efficiency
  • The Great Escapi security stance: Rejects injection without hedging or partial compliance — correct behavior
  • Auto-Triage Issues speed: Very fast triage pass enables quick issue routing

Patterns to Monitor ⚠️

  • CI Failure Doctor trigger frequency (5 runs/7 hours): While each run is high quality, the frequency indicates CI may be chronically unstable. The agent is symptom-treating rather than addressing root cause — the CI pipeline itself may need attention.
  • Semantic Function Refactoring token consumption: 2.84M tokens per run is the highest single-agent cost. The 72-turn count suggests the task scope may be broader than needed. Worth monitoring whether the output PR quality justifies the cost.
  • Example: Custom Error Patterns slow start: 7 turns but 10.5 min total suggests slow activation or pre-job overhead — worth investigating.

Collaboration Notes

  • No agent conflicts detected today
  • AI Moderator and Auto-Triage Issues may process overlapping issue content — coordination appears clean
  • Great Escapi acts as an independent security layer — no conflicts with other agents

Coverage Analysis

Well-covered today: CI health (×5), security (×1), code quality (×1), compliance (×1), issue triage (×1), moderation (×3)

Gap noted: No campaign orchestration or documentation agents ran today (may be schedule-based).

Recommendations

High Priority

  1. Resolve P1: GH_AW_GITHUB_TOKEN — Issue #17387

    • 5 workflows affected, ~50+ failures/day from Issue Monster alone
    • Fix: Set GH_AW_GITHUB_TOKEN repository secret
    • Impact: Immediate 3–5 percentage point success rate improvement
  2. Investigate CI stability — CI Failure Doctor fired 5 times in 7 hours

    • The agent performs well but the underlying CI flakiness is expensive (5 × 920K tokens avg = 4.6M tokens/day just on CI diagnosis)
    • Consider addressing root CI stability issues to reduce trigger frequency
    • Estimated savings: 2–3M tokens/day if CI stabilizes

Medium Priority

  1. Review Semantic Function Refactoring scope — 2.84M tokens is the highest per-run cost

    • 72 turns suggests broad scope or verbose analysis
    • Consider adding scope constraints or output length limits to reduce by 20–30%
  2. Recompile 14 stale lock files — Per Workflow Health Manager, make recompile needed for 14 workflows

Low Priority

  1. Monitor Example: Custom Error Patterns — Long wall-clock time (10.5 min for 7 turns) warrants one more observation run

Trends

  • Overall agent quality: 92/100 (↑ +1 from 91)
  • Average effectiveness: 88/100 (↑ +3 from 85)
  • Run success rate: 89% (↑ +18% from 71%)
  • Critical issues: 0 (19th consecutive period! 🎉)
  • Security: ✅ Prompt injection successfully blocked

Actions Taken This Run

  • Analyzed 18 completed workflow runs
  • Identified CI Failure Doctor frequency as new observation
  • Confirmed security posture (Great Escapi) excellent
  • Updated shared memory with current metrics
  • Generated this performance report

Analysis period: 2026-02-21 (runs from ~09:00–17:30 UTC)
Previous report: §22234167454
Next report: 2026-02-22

References:


Note: This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.

Tip: Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.

Generated by Agent Performance Analyzer - Meta-Orchestrator

  • expires on Feb 28, 2026, 5:33 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions