Skip to main content

Coding

GitHub, GitLab and terminal-driven engineering tasks.

Source-control and terminal environments where agents review code, manage branches, and run commands — with adversaries hiding instructions inside diffs, issues, and CI output.

Environments

The Coding domain ships 3 sandboxed environments:

Benchmark

See the leaderboard for live Indirect ASR, Direct ASR, and BSR results on the Coding domain across all supported agent frameworks and models.