Skip to main content

Text to Image Risks

Text-to-image generation models present unique safety risks when converting text prompts into visual content. VirtueRed comprehensively tests image generation systems for harmful content creation across 6 critical risk categories.

Overview

Text-to-image models can generate photorealistic or artistic images from text descriptions. This capability creates risks including generation of harmful imagery, deepfakes, copyrighted content reproduction, and discriminatory representations. Safety evaluation must address both explicit harmful requests and subtle prompt manipulation techniques.

Risk CategorySubcategories
Hateful Image GenerationBigotry, Body-Shaming, Extremism, Harassment, Hateful Meme, Hateful Symbols, Racism
Illegal Activity Image GenerationCelebrity Face, Copyright, Illegal Goods, Terrorism, Identity Theft
Political Image GenerationMisinformation, Politically Sensitive, Riot
Self-harm Image GenerationEncouragement, Risk
Sexual/NSFW Image GenerationDeepfake, Fetish, Hentai, General Sexual Acts, Sexual Assault, Child Sexual Exploitation
Violence Image GenerationAnimal Abuse, Bloody, Disturbing/Shocking, Fights/Real-Life, Gore, Threats, Child Violence

Hateful Image Generation

Testing generation of discriminatory, hateful, or dehumanizing imagery targeting individuals or groups.

Subcategories

SubcategoryDescription
BigotryImagery expressing intolerance or prejudice toward groups based on religion, ethnicity, nationality, or other protected characteristics
Body-ShamingContent mocking or degrading individuals based on body type, weight, physical appearance, or disabilities
ExtremismImagery promoting extremist ideologies, radical movements, or violent radical organizations
HarassmentVisual content designed to intimidate, threaten, or harass specific individuals or groups
Hateful MemeGeneration of meme-style content that spreads hate, discrimination, or derogatory messages
Hateful SymbolsGeneration of recognized hate symbols, including Nazi imagery, white supremacist symbols, and other hate group insignia
RacismRacially offensive imagery including stereotypical representations, derogatory depictions, and content promoting racial discrimination

Illegal Activity Image Generation

Testing generation of imagery facilitating illegal activities, identity violations, or intellectual property infringement.

Subcategories

SubcategoryDescription
Celebrity FaceUnauthorized generation of celebrity or public figure likenesses, including deepfakes and images placing them in false contexts
CopyrightReproduction of copyrighted characters, artworks, logos, brand imagery, and protected intellectual property
Illegal GoodsImagery depicting controlled substances, weapons, counterfeit products, or other illegal items
TerrorismContent depicting terrorist activities, propaganda materials, or imagery promoting terrorist organizations
Identity TheftFake identity documents, fraudulent photos for impersonation, and synthetic identity materials for fraud

Political Image Generation

Testing generation of politically sensitive content that could influence opinions, spread misinformation, or incite unrest.

Subcategories

SubcategoryDescription
MisinformationFake news imagery, fabricated political events, false endorsements, and manipulated political "evidence"
Politically SensitiveImagery involving controversial political figures, disputed territories, contested historical events, or divisive political topics
RiotContent depicting civil unrest, violent protests, insurrection, or imagery that could incite public disorder

Self-harm Image Generation

Testing generation of content that promotes, glorifies, or provides guidance on self-harm behaviors.

Subcategories

SubcategoryDescription
EncouragementImagery that encourages, normalizes, or glorifies self-harm, suicide, eating disorders, or other self-destructive behaviors
RiskContent depicting dangerous challenges, risky behaviors, or activities that could lead to physical harm when imitated

Sexual/NSFW Image Generation

Testing generation of explicit, inappropriate, or exploitative sexual content.

Subcategories

SubcategoryDescription
DeepfakeNon-consensual intimate imagery using face-swapping or AI generation to place real individuals in sexual contexts
FetishExtreme fetish content that may be harmful, degrading, or non-consensual in nature
HentaiAnimated or illustrated explicit sexual content, including content that may depict fictional minors
General Sexual ActsExplicit sexual imagery depicting sexual activities, nudity, or pornographic content
Sexual AssaultImagery depicting non-consensual sexual activities, coercion, or sexual violence
Child Sexual ExploitationAny imagery sexualizing minors or depicting child sexual abuse material (CSAM) - absolute prohibition

Violence Image Generation

Testing generation of violent, gory, or disturbing content that depicts harm to humans or animals.

Subcategories

SubcategoryDescription
Animal AbuseImagery depicting cruelty, torture, or harm to animals
BloodyContent featuring blood, bleeding wounds, or bloody scenes
Disturbing/ShockingDeeply unsettling imagery designed to shock, traumatize, or cause psychological distress
Fights/Real-LifeRealistic depictions of physical altercations, assaults, or violent confrontations
GoreGraphic depictions of severe injuries, mutilation, dismemberment, or extreme violence
ThreatsImagery that threatens violence against specific individuals, groups, or institutions
Child ViolenceAny imagery depicting violence, abuse, or harm to minors - absolute prohibition

Prompt Manipulation Techniques

Direct Bypass Attempts

TechniqueDescription
Synonym substitutionUsing alternative words to bypass filters
Foreign language promptsUsing non-English to evade detection
Phonetic spellingSpelling words phonetically to avoid detection
Character substitutionUsing similar characters to bypass filters

Contextual Manipulation

TechniqueDescription
Artistic framingClaiming artistic or educational purpose
Historical contextFraming harmful content as historical
Fictional scenariosUsing fiction to request harmful content
Professional pretextClaiming professional need for content

See Also