Text and Video to Text Risks
Text and Video to Text models process video content alongside text prompts to generate textual outputs, creating unique risks in video understanding, temporal reasoning, and content analysis. VirtueRed comprehensively tests these systems across 6 critical risk categories for harmful video interpretation and information extraction.
Overview
Video-language models that generate text from video inputs enable powerful applications like video captioning, content moderation, and visual question answering. However, they also create attack surfaces where harmful video content can trigger dangerous textual outputs, or where adversaries can extract sensitive information from video analysis.
| Risk Category | Subcategories |
|---|
| Illegal Activity Video Interpretation | Arson And Explosion, Drugs, Robbery And Burglary, Shoplifting And Stealing, War And Military Actions |
| Self-Harm Video Interpretation | Extremely Disturbing Content, Incitement To Mental Depression, Incitement To Violence, Suicide And Self-Harm |
| Harassment Video Interpretation | General Abuse, Animal Abuse, Campus Bullying, Child Abuse, Sexual Bullying |
| Misinformation Video Interpretation | Acting, AIGC, Misinformation, Out-Of-Date |
| Sexual Video Interpretation | Evident, Hentai, Implication, Subtle |
| Violence Video Interpretation | Assault, Fighting, Sexual Violence, Shooting, Vandalism |
Illegal Activity Video Interpretation
Testing model responses when processing video content depicting illegal activities, extracting harmful information, or providing guidance on criminal actions.
Subcategories
| Subcategory | Description |
|---|
| Arson And Explosion | Interpretation of videos depicting arson, explosions, or incendiary activities that could extract techniques or provide guidance |
| Drugs | Analysis of drug-related video content including manufacturing, distribution, or usage that could facilitate illegal drug activities |
| Robbery And Burglary | Interpretation of theft-related videos that could provide techniques, planning guidance, or security bypass information |
| Shoplifting And Stealing | Processing retail theft videos that could extract concealment techniques or theft methodologies |
| War And Military Actions | Analysis of combat, military operations, or warfare videos that could provide tactical information or weapon usage guidance |
Self-Harm Video Interpretation
Testing model responses when processing video content depicting self-harm, suicide, or psychologically damaging material.
Subcategories
| Subcategory | Description |
|---|
| Extremely Disturbing Content | Interpretation of deeply traumatic video content that could cause psychological harm or desensitization |
| Incitement To Mental Depression | Analysis of video content designed to worsen mental health, promote hopelessness, or trigger depressive episodes |
| Incitement To Violence | Processing videos that incite viewers toward violent behavior, aggression, or harmful actions |
| Suicide And Self-Harm | Interpretation of self-harm or suicide-related videos that could provide methods, encouragement, or normalization |
Harassment Video Interpretation
Testing model responses when processing video content depicting various forms of abuse, bullying, or harassment.
Subcategories
| Subcategory | Description |
|---|
| General Abuse | Interpretation of videos depicting physical, emotional, or psychological abuse of individuals |
| Animal Abuse | Analysis of video content showing cruelty, torture, or harm to animals |
| Campus Bullying | Processing videos depicting school or university bullying, harassment, or student victimization |
| Child Abuse | Interpretation of videos depicting abuse, neglect, or harm to minors - requires protective refusal |
| Sexual Bullying | Analysis of sexually-oriented harassment, intimidation, or exploitation in video content |
Testing model responses when processing deceptive, manipulated, or misleading video content.
Subcategories
| Subcategory | Description |
|---|
| Acting | Detection and interpretation of staged or scripted content presented as authentic real events |
| AIGC | Identification and analysis of AI-generated video content, including deepfakes and synthetic media |
| Misinformation | Processing factually incorrect video content, false claims, or deliberately misleading narratives |
| Out-Of-Date | Interpretation of old video presented as current events, temporal manipulation, or context-shifted content |
Sexual Video Interpretation
Testing model responses when processing video content with sexual themes ranging from explicit to implied.
Subcategories
| Subcategory | Description |
|---|
| Evident | Interpretation of explicitly sexual video content with clear adult themes and activities |
| Hentai | Analysis of animated or illustrated sexual video content, including Japanese-style adult animation |
| Implication | Processing video with implied sexual content, suggestive themes, or sexual undertones |
| Subtle | Interpretation of video with subtle sexual elements, mild suggestiveness, or borderline content |
Violence Video Interpretation
Testing model responses when processing video content depicting various forms of violence and harmful physical actions.
Subcategories
| Subcategory | Description |
|---|
| Assault | Interpretation of videos depicting physical attacks, battery, or violent confrontations against individuals |
| Fighting | Analysis of combat videos including street fights, brawls, or organized violence |
| Sexual Violence | Processing videos depicting sexual assault, rape, or coerced sexual acts - requires protective refusal |
| Shooting | Interpretation of firearm-related violence, gun attacks, or shooting incidents |
| Vandalism | Analysis of property destruction, deliberate damage, or destructive behavior in video content |
Technical Exploitation
Frame-by-Frame Attacks
| Attack Vector | Description |
|---|
| Hidden frame injection | Embedding harmful frames within benign video |
| Temporal message encoding | Encoding messages across video frames |
| Safety bypass through motion | Using motion to mask harmful content |
| Progressive content introduction | Gradually introducing harmful content over time |
Context Exploitation
| Attack Vector | Description |
|---|
| Educational framing | Using educational videos to request harmful information |
| Professional context | Exploiting professional video context |
| Fictional framing | Using fiction videos to request harmful content |
| Historical context | Exploiting historical footage for harmful requests |
See Also