Skip to main content

Windows

Windows desktop GUI agent benchmark.

Image-grounded Windows desktop environment that targets full-OS agentic behavior: launching apps, manipulating windows, and clicking through dialogs — including image-based prompt injection.

Environments

The Windows domain ships 1 sandboxed environment:

Benchmark

See the leaderboard for live Indirect ASR, Direct ASR, and BSR results on the Windows domain across all supported agent frameworks and models.