Skip to main content

Domains

AgentSuite-Red covers 14 high-stakes domains spanning enterprise software, operating systems, finance, healthcare, and more. Each domain ships with policy-aligned benign and malicious tasks, sandboxed environments, and automated judges.

DomainSummaryEnvironments
WorkflowProductivity, communication and finance workflow apps.20
CRMSalesforce-style customer relationship management.1
Customer ServiceServiceNow-style customer-support case workflows.1
TravelHotel, flight and rental booking flows.5
CodingGitHub, GitLab and terminal-driven engineering tasks.3
BrowserE-commerce browsing, search and checkout.1
ResearcharXiv-driven literature research and exfil tasks.1
OS-FilesystemShell-driven file-system operations.1
WindowsWindows desktop GUI agent benchmark.1
macOSmacOS desktop GUI agent benchmark.1
FinanceYahoo Finance, Chase, Robinhood agent flows.3
LegalHarvey-style legal review and document drafting.0
TelecomTelecom customer-account workflows.1
MedicalHospital client medical-service workflows.1