Action Guard

Technical Overview

Action Guard is a real-time guardrail that protects AI agents from unsafe, malicious, or policy-violating actions before they execute. Action Guard inspects every outgoing tool call — including MCP tool invocations, function calls, and API requests — and blocks actions that violate configured policies or exhibit malicious intent.

Action Guard uses our purpose-built models that understand the full context of each tool call — the prior conversation, the agent's plan, and the action's arguments — enforcing standard policy frameworks (e.g., EU AI Act, OWASP LLM Top 10) as well as fully customized organizational policies.

Low latency — adds as little as 100ms per action, so it can sit on the hot path of every tool call without degrading agent performance.
Context-aware action analysis — purpose-built for distinguishing legitimate task-driven actions from unauthorized, malicious, or policy-violating tool calls.
Flexible policy enforcement — Action Guard accepts customer-defined policies at runtime without retraining the model, enabling instant adjustment of guardrail behavior and policy-based security enforcement.
Low false positives and tunable thresholds — we optimize our model to minimize false positives, and expose tunable thresholds so teams can dial detection sensitivity to their environment, balancing false-positive rate against coverage for their specific risk profile.

Key Features

Accurate action analysis — semantic understanding of the agent's intent and the action's arguments reduces false positives compared to static rules, while still catching subtle policy violations that argument-level filters cannot detect.
Multi-source coverage — protects against unsafe tool calls driven by user requests, indirect injections embedded in tool outputs or retrieved documents, and compromised intermediate reasoning steps.
Standard and custom policies — out-of-the-box enforcement for EU AI Act, OWASP LLM Top 10, and other regulatory frameworks, plus support for user-defined organizational policies.
Seamless integration — when using through agent hooks or gateway, Action Guard is applied automatically to every tool call; standalone use is also supported via direct API calls.
Multi-turn detection — monitors an agent's conversational flow and decision-making to ensure the agent doesn't perform unauthorized actions across multi-turn interactions.
Long-context support — handles execution traces with hundreds to thousands of tool calls, so coverage scales to long-running agent sessions.

Risk Categories

Action Guard detects and blocks tool calls across the following categories:

Unauthorized actions — tool calls that exceed the agent's intended permissions or perform operations outside its sanctioned scope.
Privilege violation and escalation - Validates that agents only access resources and perform actions within their granted permissions
Destructive operations — irreversible or high-impact actions (e.g., data deletion, financial transactions, infrastructure changes) that violate configured guardrails.
Hijacked tool use — tool calls driven by injected instructions from compromised context (tool outputs, retrieved documents, MCP responses) rather than the user's original request.
Sensitive and private data exfiltration — actions that would transmit PII, credentials, financial data, or other sensitive information to unauthorized destinations.
Policy violations — actions that violate configured policies, including standard frameworks (EU AI Act, OWASP LLM Top 10) and custom organizational rules.

For each flagged action, Action Guard returns the violated policy, the reason for the decision, and a confidence score, giving security teams the visibility needed to audit decisions and tune thresholds over time.

Technical Overview​

Key Features​

Risk Categories​

Technical Overview

Key Features

Risk Categories