VirtueGuard-Text English Setup Guide

Introduction

Introducing our cutting-edge text guardrail model that effectively identifies potentially harmful content while maintaining high performance. Our model outperforms industry leaders like OpenAI's API, Perspective API, and open-source models (Llama-Guard, ShieldGemma, NeMo43B-Defensive) across key metrics such as latency and false positive rate on datasets including the OpenAI Moderation Dataset, Toxic Chat Dataset, and others. Read our blog for more details.

Risk Categories

VirtueGuard-Text provides comprehensive content guardrails for text, detecting various categories of potentially harmful content:

Category	Description
S1 (Violent Crimes)	Content related to violent criminal activities
S2 (Non-Violent Crimes)	Content related to non-violent criminal activities
S3 (Sex-Related Crimes)	Content involving sexual crimes or exploitation
S4 (Child Sexual Exploitation)	Content related to exploitation of minors
S5 (Specialized Advice)	Potentially harmful specialized guidance or instructions
S6 (Privacy)	Content that may compromise personal privacy
S7 (Intellectual Property)	Content that violates intellectual property rights
S8 (Indiscriminate Weapons)	Content related to weapons of mass destruction
S9 (Hate)	Hate speech, discrimination, or extremist content
S10 (Suicide & Self-Harm)	Content promoting self-injury or suicide
S11 (Sexual Content)	Inappropriate sexual content or explicit material
S12 (Jailbreak / Prompt Injections)	Attempts to bypass AI safety measures

API Integration

Authentication and Endpoint

All API requests require an API key included in the request headers, as shown below. Use the following endpoint for making requests:

Endpoint: https://api.virtueai.io/api/textmoderation

Method: POST

Headers:

Content-Type: application/json
API-KEY: your_api_key_here

API Request Example

Python
Typescript

import requests

url = "https://api.virtueai.io/api/textmoderation"
headers = {
    "Content-Type": "application/json",
    "API-KEY": "your_api_key_here"
}

payload = {
    "prompt": "Your text content here"
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())

import axios from 'axios';

async function moderateText(text: string): Promise<any> {
  try {
    const url = 'https://api.virtueai.io/api/textmoderation';
    const headers = {
      'Content-Type': 'application/json',
      'API-KEY': 'your_api_key_here'
    };

    const payload = {
      prompt: text
    };

    const response = await axios.post(url, payload, { headers });
    return response.data;
  } catch (error) {
    if (error instanceof Error) {
      throw new Error(`Text moderation failed: ${error.message}`);
    }
    throw error;
  }
}

// Example usage
async function example() {
  try {
    const result = await moderateText('Sample text here');
    console.log('Result:', result);
  } catch (error) {
    console.error('Error:', error);
  }
}

export { moderateText };

Output Format

The API returns a JSON response with the following structure:

{
    "result": "Unsafe\nC5"
}

The response includes:

`result` (str)

A string field indicating the moderation decision:

"Safe": No risks detected in the content
"Unsafe": Risks detected, followed by the specific category code (e.g., "C5")

When unsafe content is detected, the response includes the category code(s) corresponding to the detected risks from the categories listed above.

Introduction​

Risk Categories​

API Integration​

Authentication and Endpoint​

API Request Example​

Output Format​

result (str)​