Text and Image to Video Risks
Text and Image to Video generation models combine reference images with text prompts to create video content, introducing unique risks from image-guided video synthesis. VirtueRed comprehensively tests these systems across 5 critical risk categories for content safety vulnerabilities specific to image-conditioned video generation.
Overview
Image-to-video models use reference images to guide video generation, enabling powerful capabilities like animating still photos, extending images into video sequences, and creating videos featuring specific visual elements. This creates expanded attack surfaces where harmful images can propagate into video content with added temporal dimensions.
| Risk Category | Subcategories |
|---|
| Guided Video Violence Generation | Disturbing/Shocking, Bloody, Gore, Fights/Real-Life, Animal Abuse, Threats, Child Violence |
| Guided Video Hate Generation | Hateful Meme, Racism, Body-Shaming, Harassment, Extremism, Bigotry, Hateful Symbols |
| Guided Video Self-Harm Generation | Encouragement, Risk |
| Guided Video NSFW Generation | General Sexual Acts, Fetish, Hentai, Deepfake, Sexual Assault, Child Sexual Exploitation |
| Guided Video Illegal Activity Generation | Identity Theft, Terrorism, Illegal Goods, Copyright, Celebrity Face |
Guided Video Violence Generation
Testing generation of violent, gory, or disturbing video content from reference images depicting harm to humans or animals.
Subcategories
| Subcategory | Description |
|---|
| Disturbing/Shocking | Animating disturbing still images into deeply unsettling video content designed to shock or traumatize |
| Bloody | Extending bloody or injury images into video content featuring blood, wounds, or bloody scenes with motion |
| Gore | Animating graphic images into video depictions of severe injuries, mutilation, or extreme violence |
| Fights/Real-Life | Creating realistic video depictions of physical altercations or violent confrontations from reference images |
| Animal Abuse | Generating video depicting cruelty, torture, or harm to animals from reference images |
| Threats | Creating threatening video content from images that threatens violence against individuals or groups |
| Child Violence | Any video depicting violence, abuse, or harm to minors from reference images - absolute prohibition |
Guided Video Hate Generation
Testing generation of discriminatory, hateful, or dehumanizing video content from reference images targeting individuals or groups.
Subcategories
| Subcategory | Description |
|---|
| Hateful Meme | Animating hateful meme images into video content that spreads hate, discrimination, or derogatory messages |
| Racism | Extending racially offensive images into video including stereotypical representations and discriminatory content |
| Body-Shaming | Creating video content from images that mock or degrade individuals based on body type or appearance |
| Harassment | Generating video content from images designed to intimidate, threaten, or harass specific individuals |
| Extremism | Animating extremist imagery into video promoting radical ideologies or violent organizations |
| Bigotry | Creating video expressing intolerance from images targeting groups based on protected characteristics |
| Hateful Symbols | Animating hate symbol images into video featuring Nazi imagery, white supremacist symbols, or hate group insignia |
Guided Video Self-Harm Generation
Testing generation of video content from reference images that promotes, glorifies, or provides guidance on self-harm behaviors.
Subcategories
| Subcategory | Description |
|---|
| Encouragement | Animating images into video that encourages, normalizes, or glorifies self-harm, suicide, or self-destructive behaviors |
| Risk | Creating video from images depicting dangerous challenges or risky activities that could lead to physical harm |
Guided Video NSFW Generation
Testing generation of explicit, inappropriate, or exploitative sexual video content from reference images.
Subcategories
| Subcategory | Description |
|---|
| General Sexual Acts | Animating images into explicit video depicting sexual activities, nudity, or pornographic content |
| Fetish | Creating extreme fetish video content from images that may be harmful, degrading, or non-consensual |
| Hentai | Generating animated explicit sexual video content from illustrated images, including fictional minor depictions |
| Deepfake | Creating non-consensual intimate video by animating real individuals' photos into sexual contexts |
| Sexual Assault | Generating video depicting non-consensual sexual activities from reference images |
| Child Sexual Exploitation | Any video sexualizing minors from reference images - absolute prohibition |
Guided Video Illegal Activity Generation
Testing generation of video facilitating illegal activities, identity violations, or intellectual property infringement from reference images.
Subcategories
| Subcategory | Description |
|---|
| Identity Theft | Creating fake identity videos by animating photos for impersonation or synthetic identity fraud |
| Terrorism | Generating video depicting terrorist activities or propaganda from reference images |
| Illegal Goods | Animating images into video depicting controlled substances, weapons, or other illegal items |
| Copyright | Creating video by animating copyrighted characters, artworks, or protected intellectual property |
| Celebrity Face | Unauthorized animation of celebrity or public figure photos into video content, including deepfakes |
Technical Exploitation
Multi-Image Attacks
| Attack Vector | Description |
|---|
| Face-body swap | Combining faces and bodies from different images |
| Scene composition | Creating composite video from multiple sources |
| Identity mixing | Blending multiple identities in generated video |
| Progressive manipulation | Using sequences of images to bypass filters |
Prompt-Image Coordination
| Attack Vector | Description |
|---|
| Innocent image + harmful prompt | Using benign images with harmful text |
| Harmful image + innocent prompt | Using concerning images with benign text |
| Encoded instructions | Hiding instructions in image metadata |
| Split intent attacks | Distributing harmful intent across modalities |
See Also