Skip to main content

VirtueGuard-Text English Setup Guide

Introduction

Introducing our cutting-edge text guardrail model that effectively identifies potentially harmful content while maintaining high performance. Our model outperforms industry leaders like OpenAI's API, Perspective API, and open-source models (Llama-Guard, ShieldGemma, NeMo43B-Defensive) across key metrics such as latency and false positive rate on datasets including the OpenAI Moderation Dataset, Toxic Chat Dataset, and others. Read our blog for more details.


Risk Categories

VirtueGuard-Text provides comprehensive content guardrails for text, detecting various categories of potentially harmful content:

CategoryDescription
S1 (Violent Crimes)Content related to violent criminal activities
S2 (Non-Violent Crimes)Content related to non-violent criminal activities
S3 (Sex-Related Crimes)Content involving sexual crimes or exploitation
S4 (Child Sexual Exploitation)Content related to exploitation of minors
S5 (Specialized Advice)Potentially harmful specialized guidance or instructions
S6 (Privacy)Content that may compromise personal privacy
S7 (Intellectual Property)Content that violates intellectual property rights
S8 (Indiscriminate Weapons)Content related to weapons of mass destruction
S9 (Hate)Hate speech, discrimination, or extremist content
S10 (Suicide & Self-Harm)Content promoting self-injury or suicide
S11 (Sexual Content)Inappropriate sexual content or explicit material
S12 (Jailbreak / Prompt Injections)Attempts to bypass AI safety measures

API Integration

Authentication and Endpoint

All API requests require an API key included in the request headers, as shown below. Use the following endpoint for making requests:

Endpoint: https://api.virtueai.io/api/textmoderation

Method: POST

Headers:

  • Content-Type: application/json
  • API-KEY: your_api_key_here

API Request Example

import requests

url = "https://api.virtueai.io/api/textmoderation"
headers = {
"Content-Type": "application/json",
"API-KEY": "your_api_key_here"
}

payload = {
"prompt": "Your text content here"
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

Output Format

The API returns a JSON response with the following structure:

{
"result": "Unsafe\nC5"
}

The response includes:

result (str)

A string field indicating the moderation decision:

  • "Safe": No risks detected in the content
  • "Unsafe": Risks detected, followed by the specific category code (e.g., "C5")

When unsafe content is detected, the response includes the category code(s) corresponding to the detected risks from the categories listed above.