Skip to main content

Research

arXiv-driven literature research and exfil tasks.

Research workflows over an arXiv-style literature corpus, evaluating whether agents stay aligned with the user goal under prompt injections planted in abstracts, comments, and citations.

Environments

The Research domain ships 1 sandboxed environment:

Benchmark

See the leaderboard for live Indirect ASR, Direct ASR, and BSR results on the Research domain across all supported agent frameworks and models.