Skip to main content

macOS

macOS desktop GUI agent benchmark.

Image-grounded macOS desktop environment counterpart to Windows, exercising click-driven workflows over native applications under both pop-up and screenshot-borne injections.

Environments

The macOS domain ships 1 sandboxed environment:

Benchmark

See the leaderboard for live Indirect ASR, Direct ASR, and BSR results on the macOS domain across all supported agent frameworks and models.