Research
arXiv-driven literature research and exfil tasks.
Research workflows over an arXiv-style literature corpus, evaluating whether agents stay aligned with the user goal under prompt injections planted in abstracts, comments, and citations.
Environments
The Research domain ships 1 sandboxed environment:
Benchmark
See the leaderboard for live Indirect ASR, Direct ASR, and BSR results on the Research domain across all supported agent frameworks and models.