Skip to content

chore: reduce CI flakiness across webkit/macOS, ubuntu, windows#1914

Merged
yury-s merged 1 commit intomicrosoft:mainfrom
yury-s:fix-ci-flakiness
May 1, 2026
Merged

chore: reduce CI flakiness across webkit/macOS, ubuntu, windows#1914
yury-s merged 1 commit intomicrosoft:mainfrom
yury-s:fix-ci-flakiness

Conversation

@yury-s
Copy link
Copy Markdown
Member

@yury-s yury-s commented May 1, 2026

Summary

Three independent fixes for the most common failure modes of the Build & Test workflow on main:

  • .github/workflows/test.yml — webkit matrix now runs on macos-15-xlarge (paid M1 Pro, 6 vCPU / 14 GB) instead of macos-latest (free M1, 3 vCPU / 7 GB). Every recent push run failed dev (macos-latest, webkit) deterministically — the first failure each run is a popup-related test (window.open(...), waitForPopup) crashing the WebKit process and cascading TargetClosedError into the rest of the class. Upstream's webkit matrix has always used xlarge (tests_secondary.yml); commit microsoft/playwright@6fc20c3c1 ("devops: bump macos bots") only bumped the version, never the size. Chromium and firefox keep macos-latest.
  • TestRouteWebSocket.setupWS — drop the 'error' event listener. WebKit fires a spurious error before close on non-normal closures (e.g. 1008), intermittently failing shouldWorkWithTextMessage on dev (ubuntu-latest, webkit) with [open, message, error, close] instead of the expected [open, message, close]. The Java port has no tests that assert 'error' appears in the log.
  • scripts/download_driver.sh — pass --retry 5 --retry-delay 2 -fL to curl. Recent dev (windows-latest, firefox) runs failed at the driver-download step with curl: (56) schannel: server closed abruptly (missing close_notify). wget already retries 20 times by default, so the wget branch is unchanged.

Notes

  • Job name in CI shifts from dev (macos-latest, webkit)dev (macos-15-xlarge, webkit). If any branch protection rules / required-checks reference the old name, they need updating.
  • Driver bundle bytes (~40 MB on Windows) and curl version (8.x on hosted Windows) easily handle --retry 5.

Three independent fixes for recurring CI failures on main:

- test.yml: run the webkit matrix on macos-15-xlarge instead of
  macos-latest. macos-latest is the free 3 vCPU / 7 GB M1 runner; the
  webkit suite has been crashing window.open() / popup tests on it
  every push run. Upstream's webkit matrix has always used xlarge
  (tests_secondary.yml). Other browsers stay on macos-latest.
- TestRouteWebSocket: drop the 'error' event listener in setupWS.
  WebKit fires a spurious 'error' before 'close' on non-normal
  closures (e.g. 1008), which intermittently failed
  shouldWorkWithTextMessage with [open, message, error, close]
  vs the expected [open, message, close]. The Java port has no
  tests that assert on 'error'.
- download_driver.sh: pass --retry 5 --retry-delay 2 -fL to curl,
  fixing the windows driver download that occasionally errors with
  'curl: (56) schannel: server closed abruptly'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yury-s yury-s merged commit 5b33729 into microsoft:main May 1, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants