Text-to-video generation models present escalated safety risks compared to static image generation, adding temporal dynamics, motion, and narrative capabilities. VirtueRed comprehensively tests video generation systems for harmful content creation across 6 critical risk categories.
Overview
Text-to-video models generate moving visual content from text descriptions, creating compelling synthetic media with enhanced persuasive power. The temporal dimension adds complexity to safety evaluation—harmful content can unfold over time, making detection more challenging and impact more significant.
| Risk Category | Subcategories |
|---|
| Video Violence Risks | Disturbing/Shocking, Bloody, Gore, Fights/Real-Life, Animal Abuse, Threats, Child Violence |
| Video Hate Risks | Hateful Meme, Racism, Body-Shaming, Harassment, Extremism, Bigotry, Hateful Symbols |
| Video Self-Harm Risks | Encouragement, Risk |
| Video NSFW Risks | General Sexual Acts, Fetish, Hentai, Deepfake, Sexual Assault, Child Sexual Exploitation |
| Video Political Risks | Riot, Misinformation, Politically Sensitive |
| Video Illegal Activity Risks | Copyright, Celebrity Face, Terrorism, Illegal Goods, Identity Theft |
Video Violence Risks
Testing generation of violent, gory, or disturbing video content that depicts harm to humans or animals over time.
Subcategories
| Subcategory | Description |
|---|
| Disturbing/Shocking | Deeply unsettling video content designed to shock, traumatize, or cause psychological distress through moving imagery |
| Bloody | Video content featuring blood, bleeding wounds, or bloody scenes with temporal progression |
| Gore | Graphic video depictions of severe injuries, mutilation, dismemberment, or extreme violence unfolding over time |
| Fights/Real-Life | Realistic video depictions of physical altercations, assaults, or violent confrontations |
| Animal Abuse | Video depicting cruelty, torture, or harm to animals |
| Threats | Video content that threatens violence against specific individuals, groups, or institutions |
| Child Violence | Any video depicting violence, abuse, or harm to minors - absolute prohibition |
Video Hate Risks
Testing generation of discriminatory, hateful, or dehumanizing video content targeting individuals or groups.
Subcategories
| Subcategory | Description |
|---|
| Hateful Meme | Generation of meme-style video content that spreads hate, discrimination, or derogatory messages |
| Racism | Racially offensive video including stereotypical representations, derogatory depictions, and content promoting racial discrimination |
| Body-Shaming | Video content mocking or degrading individuals based on body type, weight, physical appearance, or disabilities |
| Harassment | Video content designed to intimidate, threaten, or harass specific individuals or groups |
| Extremism | Video promoting extremist ideologies, radical movements, or violent radical organizations |
| Bigotry | Video expressing intolerance or prejudice toward groups based on religion, ethnicity, nationality, or other protected characteristics |
| Hateful Symbols | Video featuring recognized hate symbols, including Nazi imagery, white supremacist symbols, and other hate group insignia |
Video Self-Harm Risks
Testing generation of video content that promotes, glorifies, or provides guidance on self-harm behaviors.
Subcategories
| Subcategory | Description |
|---|
| Encouragement | Video that encourages, normalizes, or glorifies self-harm, suicide, eating disorders, or other self-destructive behaviors |
| Risk | Video depicting dangerous challenges, risky behaviors, or activities that could lead to physical harm when imitated |
Video NSFW Risks
Testing generation of explicit, inappropriate, or exploitative sexual video content.
Subcategories
| Subcategory | Description |
|---|
| General Sexual Acts | Explicit video depicting sexual activities, nudity, or pornographic content |
| Fetish | Extreme fetish video content that may be harmful, degrading, or non-consensual in nature |
| Hentai | Animated or illustrated explicit sexual video content, including content that may depict fictional minors |
| Deepfake | Non-consensual intimate video using face-swapping or AI generation to place real individuals in sexual contexts |
| Sexual Assault | Video depicting non-consensual sexual activities, coercion, or sexual violence |
| Child Sexual Exploitation | Any video sexualizing minors or depicting child sexual abuse material (CSAM) - absolute prohibition |
Video Political Risks
Testing generation of politically sensitive video content that could influence opinions, spread misinformation, or incite unrest.
Subcategories
| Subcategory | Description |
|---|
| Riot | Video depicting civil unrest, violent protests, insurrection, or content that could incite public disorder |
| Misinformation | Fake news video, fabricated political events, false endorsements, fake speeches, and manipulated political "evidence" |
| Politically Sensitive | Video involving controversial political figures, disputed territories, contested historical events, or divisive political topics |
Video Illegal Activity Risks
Testing generation of video facilitating illegal activities, identity violations, or intellectual property infringement.
Subcategories
| Subcategory | Description |
|---|
| Copyright | Reproduction of copyrighted characters, artworks, brand imagery, and protected intellectual property in video format |
| Celebrity Face | Unauthorized generation of celebrity or public figure likenesses in video, including deepfakes and false context placement |
| Terrorism | Video depicting terrorist activities, propaganda materials, or content promoting terrorist organizations |
| Illegal Goods | Video depicting controlled substances, weapons, counterfeit products, or other illegal items |
| Identity Theft | Fake identity videos, fraudulent video content for impersonation, and synthetic identity materials for fraud |
Technical Exploitation
Prompt Manipulation for Video
| Technique | Description |
|---|
| Temporal prompt injection | Exploiting time-based instructions |
| Scene transition exploitation | Using scene changes to bypass filters |
| Motion-based bypass | Using motion descriptions to evade detection |
| Narrative framing | Using story structure to introduce harmful content |
| Multi-clip manipulation | Combining safe elements into harmful sequences |
Output Manipulation
| Technique | Description |
|---|
| Frame extraction | Extracting specific frames from generated videos |
| Re-encoding bypass | Using format changes to avoid detection |
| Splicing attacks | Combining generated clips with other content |
| Speed manipulation | Using speed changes to evade content filters |
See Also