Text and Video to Video Risks
Text and Video to Video generation models transform existing video content guided by text prompts, enabling video editing, style transfer, and content modification. VirtueRed comprehensively tests these systems across 6 critical risk categories for content safety risks unique to video-to-video generation.
Overview
Video-to-video models take existing video as input and generate modified video output based on text instructions. This creates powerful editing capabilities but also enables sophisticated manipulation attacks, including deepfake enhancement, content alteration, and unauthorized video modification. The combination of source video and text control creates unique attack surfaces.
| Risk Category | Subcategories |
|---|
| Illegal Activity Video Synthesis | Arson And Explosion, Drugs, Robbery And Burglary, Shoplifting And Stealing, War And Military Actions |
| Self-Harm Video Synthesis | Extremely Disturbing Content, Incitement To Mental Depression, Incitement To Violence, Suicide And Self-Harm |
| Abuse Video Synthesis | General Abuse, Animal Abuse, Campus Bullying, Child Abuse, Sexual Bullying |
| Misinformation Video Synthesis | Acting, AIGC, Misinformation, Out-Of-Date |
| Sexual Video Synthesis | Evident, Hentai, Implication, Subtle |
| Violence Video Synthesis | Assault, Fighting, Sexual Violence, Shooting, Vandalism |
Illegal Activity Video Synthesis
Testing generation of modified video content that depicts, enhances, or facilitates illegal activities from source videos.
Subcategories
| Subcategory | Description |
|---|
| Arson And Explosion | Modification or enhancement of video depicting arson, explosions, or incendiary activities |
| Drugs | Synthesis of drug-related video content including enhanced manufacturing demonstrations, distribution activities, or usage glorification |
| Robbery And Burglary | Video modification depicting or enhancing theft activities, security bypass techniques, or planning guidance |
| Shoplifting And Stealing | Synthesis of retail theft videos with enhanced concealment techniques or theft methodology demonstrations |
| War And Military Actions | Modification of combat or military operation videos that could provide tactical information or glorify violence |
Self-Harm Video Synthesis
Testing generation of modified video content that promotes, enhances, or depicts self-harm, suicide, or psychologically damaging material.
Subcategories
| Subcategory | Description |
|---|
| Extremely Disturbing Content | Video modification creating or enhancing deeply traumatic content designed to cause psychological harm |
| Incitement To Mental Depression | Synthesis of video content designed to worsen mental health, promote hopelessness, or trigger depressive episodes |
| Incitement To Violence | Video modification creating content that incites viewers toward violent behavior or harmful actions |
| Suicide And Self-Harm | Synthesis of self-harm or suicide-related video that provides methods, encouragement, or normalization |
Abuse Video Synthesis
Testing generation of modified video content depicting various forms of abuse, bullying, or harassment.
Subcategories
| Subcategory | Description |
|---|
| General Abuse | Video modification depicting or enhancing physical, emotional, or psychological abuse of individuals |
| Animal Abuse | Synthesis of video content showing cruelty, torture, or harm to animals |
| Campus Bullying | Video modification depicting school or university bullying, harassment, or student victimization |
| Child Abuse | Synthesis of videos depicting abuse, neglect, or harm to minors - requires absolute prohibition |
| Sexual Bullying | Video modification depicting sexually-oriented harassment, intimidation, or exploitation |
Testing generation of modified video content that creates, enhances, or propagates misinformation and deceptive media.
Subcategories
| Subcategory | Description |
|---|
| Acting | Synthesis of staged or scripted content modified to appear as authentic real events |
| AIGC | Enhancement or modification of AI-generated video to appear more authentic, including deepfake improvement |
| Misinformation | Video modification creating factually incorrect content, false claims, or deliberately misleading narratives |
| Out-Of-Date | Synthesis of old video modified to appear as current events, temporal manipulation, or context-shifting |
Sexual Video Synthesis
Testing generation of modified video content with sexual themes, from explicit to implied, through video transformation.
Subcategories
| Subcategory | Description |
|---|
| Evident | Video modification creating or enhancing explicitly sexual content with clear adult themes |
| Hentai | Synthesis of animated or illustrated sexual video content through style transfer or generation |
| Implication | Video modification adding implied sexual content, suggestive themes, or sexual undertones to source material |
| Subtle | Synthesis of video with subtle sexual elements, mild suggestiveness, or borderline content |
Violence Video Synthesis
Testing generation of modified video content depicting various forms of violence through video transformation.
Subcategories
| Subcategory | Description |
|---|
| Assault | Video modification depicting or enhancing physical attacks, battery, or violent confrontations |
| Fighting | Synthesis of combat videos including enhanced street fights, brawls, or organized violence |
| Sexual Violence | Video modification depicting sexual assault or coerced sexual acts - requires absolute prohibition |
| Shooting | Synthesis of firearm-related violence, enhanced gun attacks, or shooting incident modifications |
| Vandalism | Video modification depicting property destruction, deliberate damage, or destructive behavior |
Technical Exploitation
Source-Prompt Coordination Attacks
| Attack Vector | Description |
|---|
| Benign source + harmful prompt | Using innocent video with harmful modification instructions |
| Harmful source + benign prompt | Using concerning video with innocent-seeming text |
| Progressive prompting | Gradually escalating modification requests |
| Split intent attacks | Distributing harmful intent across source and prompt |
Output Enhancement Attacks
| Attack Vector | Description |
|---|
| Quality improvement | Enhancing quality of harmful source video |
| Stabilization weaponization | Stabilizing shaky harmful footage |
| Artifact removal | Removing detection artifacts from manipulated video |
| De-censoring attempts | Removing censorship or blurring from content |
Chained Modification Attacks
| Attack Vector | Description |
|---|
| Progressive modification | Making incremental changes toward harmful content |
| Filter bypass through iteration | Using multiple passes to bypass safety |
| Cross-model chaining | Using multiple models for complex manipulation |
| Modification stacking | Combining multiple types of changes |
See Also