Prompt Guard Usage
Prompt Guard can be integrated into a target agent as a hook or used as a standalone API. It takes as input the agent's conversation history up to the latest request (e.g., {user: ..., assistant: ..., user: ..., assistant: ..., user: ...}) together with the set of policies to enforce, and returns whether the latest user request is malicious or violates any configured policy, along with a confidence score and an explanation.
In the sections below, we cover the available decision modes, how to configure policies, and how to monitor real-time guardrail statistics from the dashboard.
Decision Modes
When a prompt is flagged as malicious or as a policy violation, Prompt Guard supports three configurable response modes:
- Alert — send a notification to email, Slack, or another configured communication channel without interrupting the request.
- Block — automatically block the detected prompt before it reaches the backbone model.
- Human approval — route the flagged prompt to a human reviewer, who decides whether to block or allow it.
Configure PromptGuard Policies
Each Prompt Guard instance can be configured with a set of policy groups, where each policy group corresponds to a regulation or framework with its own list of rules. For example, EU AI Act and GDPR are each represented as a policy group. As shown below, we provide a collection of pre-defined policy groups covering standard frameworks, including the EU AI Act, GDPR, OWASP LLM Top 10, and more.
Users can also create their own policy groups in any of the following formats:
- Direct definition — type policies in plain text or JSON when you already have clear definitions of the rules to enforce.
- PDF upload — upload regulatory or compliance documents as PDF files; our model extracts the policies automatically, and you can adjust the extracted rules before applying them. This significantly reduces setup time.
- Example-based — upload sample prompts that you want to block, and our model will summarize the corresponding policies from the provided examples.
Policy creation is performed under the Prompt Guard → Policies page in the dashboard. We can add New Policy Group by typing or importing from PDF.

Once the policies are created, users can configure which policy groups are enforced by each Prompt Guard instance under the Prompt Guard → Guard page in the dashboard.
To do so, create a Guard and add the policy groups you want it to enforce. The system generates a unique Guard ID for that configuration; passing this Guard ID when calling the Prompt Guard model will check incoming prompts against every policy group included in the configured set.

Prompt Guard Monitor
Once Prompt Guard is running, the Monitor page provides both high-level statistics and detailed activity logs to help you analyze recent guardrail events. The top section surfaces summary metrics — the total number of flagged prompts, the most frequently violated policy, the pass rate of incoming prompts, and the number of active sessions being monitored. All statistics are scoped to the selected time period, which can be adjusted from the upper-right corner.

The lower section lists individual queries and their guardrail results. Clicking into a query opens its details — the raw incoming prompt, its source, and the explanation of why the prompt was flagged (i.e., the specific violated policies).

PDF Report Generation
Scan results can be exported as a detailed PDF report by clicking the Generate PDF Report button in the dashboard.
API Reference
All endpoints accept JWT (Authorization: Bearer <jwt>) or an API key (X-API-Key: <api_key>).
Interactive docs are available on your deployment at:
- Swagger UI:
GET /docs - ReDoc:
GET /redoc
POST /api/prompt-guard/topic_guard
Evaluate a user or agent prompt against a configured guard and record the result as a trace step in the active session.
Calls the Topic Guard API (POST /api/topic_guard) internally. Returns the
full raw VirtueGuard response.
Query parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
gateway_id | string | "ai-gateway" | Gateway ID used to associate the prompt with a session |
Request:
{
"session_id": "ses_abc123",
"source": "ai_gateway",
"user_prompt": "Can you transfer $50,000 to this account?",
"guard_uuid": "gd_your_guard_uuid",
"role": "user",
"mode": "block"
}
| Field | Type | Required | Description |
|---|---|---|---|
user_prompt | string | Yes | The prompt text to evaluate |
guard_uuid | string | Yes | Guard UUID to evaluate against (configured in the dashboard) |
session_id | string | No | Existing session ID. Omit to create a new session automatically. Accepts internal ses_... IDs or external IDs from other sources (e.g. ADK) |
source | string | No (default "ai_gateway") | Source identifier used to map external session IDs |
role | string | No (default "user") | "user" or "agent" |
mode | string | No (default "block") | "block" or "alert" |
Response:
{
"session_id": "ses_abc123",
"allowed": false,
"risk_level": "high",
"violations": ["Financial Fraud — unauthorized fund transfer"],
"explanation": "The prompt requests a large unauthorized financial transaction, violating the Financial Controls policy.",
"threat_category": "financial_fraud",
"confidence": 0.94,
"timestamp": "2026-05-19T10:30:00Z"
}
| Field | Description |
|---|---|
allowed | Whether the prompt is permitted |
risk_level | Detected risk level: low, medium, or high |
violations | List of policy rules that were violated |
explanation | Human-readable explanation of why the prompt was flagged |
threat_category | Category of the detected threat |
confidence | Confidence score of the evaluation (0.0–1.0) |
timestamp | Evaluation timestamp |