Coding
GitHub, GitLab and terminal-driven engineering tasks.
Source-control and terminal environments where agents review code, manage branches, and run commands — with adversaries hiding instructions inside diffs, issues, and CI output.
Environments
The Coding domain ships 3 sandboxed environments:
Benchmark
See the leaderboard for live Indirect ASR, Direct ASR, and BSR results on the Coding domain across all supported agent frameworks and models.