Text to Video Risks

Text-to-video generation models present escalated safety risks compared to static image generation, adding temporal dynamics, motion, and narrative capabilities. VirtueRed comprehensively tests video generation systems for harmful content creation across 6 critical risk categories.

Overview

Text-to-video models generate moving visual content from text descriptions, creating compelling synthetic media with enhanced persuasive power. The temporal dimension adds complexity to safety evaluation—harmful content can unfold over time, making detection more challenging and impact more significant.

Risk Category	Subcategories
Video Violence Risks	Disturbing/Shocking, Bloody, Gore, Fights/Real-Life, Animal Abuse, Threats, Child Violence
Video Hate Risks	Hateful Meme, Racism, Body-Shaming, Harassment, Extremism, Bigotry, Hateful Symbols
Video Self-Harm Risks	Encouragement, Risk
Video NSFW Risks	General Sexual Acts, Fetish, Hentai, Deepfake, Sexual Assault, Child Sexual Exploitation
Video Political Risks	Riot, Misinformation, Politically Sensitive
Video Illegal Activity Risks	Copyright, Celebrity Face, Terrorism, Illegal Goods, Identity Theft

Video Violence Risks

Testing generation of violent, gory, or disturbing video content that depicts harm to humans or animals over time.

Subcategories

Subcategory	Description
Disturbing/Shocking	Deeply unsettling video content designed to shock, traumatize, or cause psychological distress through moving imagery
Bloody	Video content featuring blood, bleeding wounds, or bloody scenes with temporal progression
Gore	Graphic video depictions of severe injuries, mutilation, dismemberment, or extreme violence unfolding over time
Fights/Real-Life	Realistic video depictions of physical altercations, assaults, or violent confrontations
Animal Abuse	Video depicting cruelty, torture, or harm to animals
Threats	Video content that threatens violence against specific individuals, groups, or institutions
Child Violence	Any video depicting violence, abuse, or harm to minors - absolute prohibition

Video Hate Risks

Testing generation of discriminatory, hateful, or dehumanizing video content targeting individuals or groups.

Subcategories

Subcategory	Description
Hateful Meme	Generation of meme-style video content that spreads hate, discrimination, or derogatory messages
Racism	Racially offensive video including stereotypical representations, derogatory depictions, and content promoting racial discrimination
Body-Shaming	Video content mocking or degrading individuals based on body type, weight, physical appearance, or disabilities
Harassment	Video content designed to intimidate, threaten, or harass specific individuals or groups
Extremism	Video promoting extremist ideologies, radical movements, or violent radical organizations
Bigotry	Video expressing intolerance or prejudice toward groups based on religion, ethnicity, nationality, or other protected characteristics
Hateful Symbols	Video featuring recognized hate symbols, including Nazi imagery, white supremacist symbols, and other hate group insignia

Video Self-Harm Risks

Testing generation of video content that promotes, glorifies, or provides guidance on self-harm behaviors.

Subcategories

Subcategory	Description
Encouragement	Video that encourages, normalizes, or glorifies self-harm, suicide, eating disorders, or other self-destructive behaviors
Risk	Video depicting dangerous challenges, risky behaviors, or activities that could lead to physical harm when imitated

Video NSFW Risks

Testing generation of explicit, inappropriate, or exploitative sexual video content.

Subcategories

Subcategory	Description
General Sexual Acts	Explicit video depicting sexual activities, nudity, or pornographic content
Fetish	Extreme fetish video content that may be harmful, degrading, or non-consensual in nature
Hentai	Animated or illustrated explicit sexual video content, including content that may depict fictional minors
Deepfake	Non-consensual intimate video using face-swapping or AI generation to place real individuals in sexual contexts
Sexual Assault	Video depicting non-consensual sexual activities, coercion, or sexual violence
Child Sexual Exploitation	Any video sexualizing minors or depicting child sexual abuse material (CSAM) - absolute prohibition

Video Political Risks

Testing generation of politically sensitive video content that could influence opinions, spread misinformation, or incite unrest.

Subcategories

Subcategory	Description
Riot	Video depicting civil unrest, violent protests, insurrection, or content that could incite public disorder
Misinformation	Fake news video, fabricated political events, false endorsements, fake speeches, and manipulated political "evidence"
Politically Sensitive	Video involving controversial political figures, disputed territories, contested historical events, or divisive political topics

Video Illegal Activity Risks

Testing generation of video facilitating illegal activities, identity violations, or intellectual property infringement.

Subcategories

Subcategory	Description
Copyright	Reproduction of copyrighted characters, artworks, brand imagery, and protected intellectual property in video format
Celebrity Face	Unauthorized generation of celebrity or public figure likenesses in video, including deepfakes and false context placement
Terrorism	Video depicting terrorist activities, propaganda materials, or content promoting terrorist organizations
Illegal Goods	Video depicting controlled substances, weapons, counterfeit products, or other illegal items
Identity Theft	Fake identity videos, fraudulent video content for impersonation, and synthetic identity materials for fraud

Technical Exploitation

Prompt Manipulation for Video

Technique	Description
Temporal prompt injection	Exploiting time-based instructions
Scene transition exploitation	Using scene changes to bypass filters
Motion-based bypass	Using motion descriptions to evade detection
Narrative framing	Using story structure to introduce harmful content
Multi-clip manipulation	Combining safe elements into harmful sequences

Output Manipulation

Technique	Description
Frame extraction	Extracting specific frames from generated videos
Re-encoding bypass	Using format changes to avoid detection
Splicing attacks	Combining generated clips with other content
Speed manipulation	Using speed changes to evade content filters

Overview​

Video Violence Risks​

Subcategories​

Video Hate Risks​

Subcategories​

Video Self-Harm Risks​

Subcategories​

Video NSFW Risks​

Subcategories​

Video Political Risks​

Subcategories​

Video Illegal Activity Risks​

Subcategories​

Technical Exploitation​

Prompt Manipulation for Video​

Output Manipulation​

See Also​

Overview

Video Violence Risks

Subcategories

Video Hate Risks

Subcategories

Video Self-Harm Risks

Subcategories

Video NSFW Risks

Subcategories

Video Political Risks

Subcategories

Video Illegal Activity Risks

Subcategories

Technical Exploitation

Prompt Manipulation for Video

Output Manipulation

See Also