Windows
Windows desktop GUI agent benchmark.
Image-grounded Windows desktop environment that targets full-OS agentic behavior: launching apps, manipulating windows, and clicking through dialogs — including image-based prompt injection.
Environments
The Windows domain ships 1 sandboxed environment:
Benchmark
See the leaderboard for live Indirect ASR, Direct ASR, and BSR results on the Windows domain across all supported agent frameworks and models.