VirtueGuard-Text English Setup Guide
Introduction
Introducing our cutting-edge text guardrail model that effectively identifies potentially harmful content while maintaining high performance. Our model outperforms industry leaders like OpenAI's API, Perspective API, and open-source models (Llama-Guard, ShieldGemma, NeMo43B-Defensive) across key metrics such as latency and false positive rate on datasets including the OpenAI Moderation Dataset, Toxic Chat Dataset, and others. Read our blog for more details.
Risk Categories
VirtueGuard-Text provides comprehensive content guardrails for text, detecting various categories of potentially harmful content:
| Category | Description |
|---|---|
| S1 (Violent Crimes) | Content related to violent criminal activities |
| S2 (Non-Violent Crimes) | Content related to non-violent criminal activities |
| S3 (Sex-Related Crimes) | Content involving sexual crimes or exploitation |
| S4 (Child Sexual Exploitation) | Content related to exploitation of minors |
| S5 (Specialized Advice) | Potentially harmful specialized guidance or instructions |
| S6 (Privacy) | Content that may compromise personal privacy |
| S7 (Intellectual Property) | Content that violates intellectual property rights |
| S8 (Indiscriminate Weapons) | Content related to weapons of mass destruction |
| S9 (Hate) | Hate speech, discrimination, or extremist content |
| S10 (Suicide & Self-Harm) | Content promoting self-injury or suicide |
| S11 (Sexual Content) | Inappropriate sexual content or explicit material |
| S12 (Jailbreak / Prompt Injections) | Attempts to bypass AI safety measures |
API Integration
Authentication and Endpoint
All API requests require an API key included in the request headers, as shown below. Use the following endpoint for making requests:
Endpoint: https://api.virtueai.io/api/textmoderation
Method: POST
Headers:
Content-Type: application/jsonAPI-KEY: your_api_key_here
API Request Example
- Python
- Typescript
import requests
url = "https://api.virtueai.io/api/textmoderation"
headers = {
"Content-Type": "application/json",
"API-KEY": "your_api_key_here"
}
payload = {
"prompt": "Your text content here"
}
response = requests.post(url, headers=headers, json=payload)
print(response.json())
import axios from 'axios';
async function moderateText(text: string): Promise<any> {
try {
const url = 'https://api.virtueai.io/api/textmoderation';
const headers = {
'Content-Type': 'application/json',
'API-KEY': 'your_api_key_here'
};
const payload = {
prompt: text
};
const response = await axios.post(url, payload, { headers });
return response.data;
} catch (error) {
if (error instanceof Error) {
throw new Error(`Text moderation failed: ${error.message}`);
}
throw error;
}
}
// Example usage
async function example() {
try {
const result = await moderateText('Sample text here');
console.log('Result:', result);
} catch (error) {
console.error('Error:', error);
}
}
export { moderateText };
Output Format
The API returns a JSON response with the following structure:
{
"result": "Unsafe\nC5"
}
The response includes:
result (str)
A string field indicating the moderation decision:
- "Safe": No risks detected in the content
- "Unsafe": Risks detected, followed by the specific category code (e.g., "C5")
When unsafe content is detected, the response includes the category code(s) corresponding to the detected risks from the categories listed above.