Text and Video to Text Risks

Text and Video to Text models process video content alongside text prompts to generate textual outputs, creating unique risks in video understanding, temporal reasoning, and content analysis. VirtueRed comprehensively tests these systems across 6 critical risk categories for harmful video interpretation and information extraction.

Overview

Video-language models that generate text from video inputs enable powerful applications like video captioning, content moderation, and visual question answering. However, they also create attack surfaces where harmful video content can trigger dangerous textual outputs, or where adversaries can extract sensitive information from video analysis.

Risk Category	Subcategories
Illegal Activity Video Interpretation	Arson And Explosion, Drugs, Robbery And Burglary, Shoplifting And Stealing, War And Military Actions
Self-Harm Video Interpretation	Extremely Disturbing Content, Incitement To Mental Depression, Incitement To Violence, Suicide And Self-Harm
Harassment Video Interpretation	General Abuse, Animal Abuse, Campus Bullying, Child Abuse, Sexual Bullying
Misinformation Video Interpretation	Acting, AIGC, Misinformation, Out-Of-Date
Sexual Video Interpretation	Evident, Hentai, Implication, Subtle
Violence Video Interpretation	Assault, Fighting, Sexual Violence, Shooting, Vandalism

Illegal Activity Video Interpretation

Testing model responses when processing video content depicting illegal activities, extracting harmful information, or providing guidance on criminal actions.

Subcategories

Subcategory	Description
Arson And Explosion	Interpretation of videos depicting arson, explosions, or incendiary activities that could extract techniques or provide guidance
Drugs	Analysis of drug-related video content including manufacturing, distribution, or usage that could facilitate illegal drug activities
Robbery And Burglary	Interpretation of theft-related videos that could provide techniques, planning guidance, or security bypass information
Shoplifting And Stealing	Processing retail theft videos that could extract concealment techniques or theft methodologies
War And Military Actions	Analysis of combat, military operations, or warfare videos that could provide tactical information or weapon usage guidance

Self-Harm Video Interpretation

Testing model responses when processing video content depicting self-harm, suicide, or psychologically damaging material.

Subcategories

Subcategory	Description
Extremely Disturbing Content	Interpretation of deeply traumatic video content that could cause psychological harm or desensitization
Incitement To Mental Depression	Analysis of video content designed to worsen mental health, promote hopelessness, or trigger depressive episodes
Incitement To Violence	Processing videos that incite viewers toward violent behavior, aggression, or harmful actions
Suicide And Self-Harm	Interpretation of self-harm or suicide-related videos that could provide methods, encouragement, or normalization

Harassment Video Interpretation

Testing model responses when processing video content depicting various forms of abuse, bullying, or harassment.

Subcategories

Subcategory	Description
General Abuse	Interpretation of videos depicting physical, emotional, or psychological abuse of individuals
Animal Abuse	Analysis of video content showing cruelty, torture, or harm to animals
Campus Bullying	Processing videos depicting school or university bullying, harassment, or student victimization
Child Abuse	Interpretation of videos depicting abuse, neglect, or harm to minors - requires protective refusal
Sexual Bullying	Analysis of sexually-oriented harassment, intimidation, or exploitation in video content

Misinformation Video Interpretation

Testing model responses when processing deceptive, manipulated, or misleading video content.

Subcategories

Subcategory	Description
Acting	Detection and interpretation of staged or scripted content presented as authentic real events
AIGC	Identification and analysis of AI-generated video content, including deepfakes and synthetic media
Misinformation	Processing factually incorrect video content, false claims, or deliberately misleading narratives
Out-Of-Date	Interpretation of old video presented as current events, temporal manipulation, or context-shifted content

Sexual Video Interpretation

Testing model responses when processing video content with sexual themes ranging from explicit to implied.

Subcategories

Subcategory	Description
Evident	Interpretation of explicitly sexual video content with clear adult themes and activities
Hentai	Analysis of animated or illustrated sexual video content, including Japanese-style adult animation
Implication	Processing video with implied sexual content, suggestive themes, or sexual undertones
Subtle	Interpretation of video with subtle sexual elements, mild suggestiveness, or borderline content

Violence Video Interpretation

Testing model responses when processing video content depicting various forms of violence and harmful physical actions.

Subcategories

Subcategory	Description
Assault	Interpretation of videos depicting physical attacks, battery, or violent confrontations against individuals
Fighting	Analysis of combat videos including street fights, brawls, or organized violence
Sexual Violence	Processing videos depicting sexual assault, rape, or coerced sexual acts - requires protective refusal
Shooting	Interpretation of firearm-related violence, gun attacks, or shooting incidents
Vandalism	Analysis of property destruction, deliberate damage, or destructive behavior in video content

Technical Exploitation

Frame-by-Frame Attacks

Attack Vector	Description
Hidden frame injection	Embedding harmful frames within benign video
Temporal message encoding	Encoding messages across video frames
Safety bypass through motion	Using motion to mask harmful content
Progressive content introduction	Gradually introducing harmful content over time

Context Exploitation

Attack Vector	Description
Educational framing	Using educational videos to request harmful information
Professional context	Exploiting professional video context
Fictional framing	Using fiction videos to request harmful content
Historical context	Exploiting historical footage for harmful requests

Overview​

Illegal Activity Video Interpretation​

Subcategories​

Self-Harm Video Interpretation​

Subcategories​

Harassment Video Interpretation​

Subcategories​

Misinformation Video Interpretation​

Subcategories​

Sexual Video Interpretation​

Subcategories​

Violence Video Interpretation​

Subcategories​

Technical Exploitation​

Frame-by-Frame Attacks​

Context Exploitation​

See Also​

Overview

Illegal Activity Video Interpretation

Subcategories

Self-Harm Video Interpretation

Subcategories

Harassment Video Interpretation

Subcategories

Misinformation Video Interpretation

Subcategories

Sexual Video Interpretation

Subcategories

Violence Video Interpretation

Subcategories

Technical Exploitation

Frame-by-Frame Attacks

Context Exploitation

See Also