OWASP LLM Top 10
The OWASP (Open Worldwide Application Security Project) LLM Top 10 represents a critical security framework specifically designed to address the unique vulnerabilities and risks associated with deploying Large Language Models in production environments. VirtueRed comprehensively tests AI systems across 10 critical security domains that encompass both the unique attack vectors introduced by language models and the adaptation of traditional security concerns to the LLM context.
Overview
As organizations rapidly integrate generative AI and large language models into their applications and workflows, they face novel security challenges that traditional application security practices fail to adequately address. LLMs introduce fundamentally new security paradigms through their natural language interfaces, vast training datasets, and integration with external tools and data sources.
| Security Category | Description |
|---|---|
| Prompt Injection & Manipulation Defense | Protection against malicious prompt attacks |
| Code Execution & Injection Security | Prevention of arbitrary code execution |
| Training Data Integrity & Poisoning Prevention | Safeguarding training pipeline integrity |
| Data Privacy & Confidentiality | Protection of sensitive information |
| Supply-Chain & Model Integrity | Security of model components and dependencies |
| Content Reliability & Misinformation Control | Prevention of hallucination and false information |
| System Prompt Security | Protection of system-level instructions |
| Embedding & RAG Security | Security of retrieval-augmented systems |
| Extension & Agent Permission Management | Control of agent capabilities and tools |
| Resource Abuse & Model Theft | Prevention of DoS and IP theft |
Prompt Injection & Manipulation Defense
This category focuses on the most critical vulnerability in LLM applications - prompt injection attacks that manipulate model behavior through crafted inputs.
| Attack Vector | Description |
|---|---|
| Direct Prompt Injection | Testing ability to override system instructions |
| Indirect Prompt Injection | Evaluating attacks through documents, emails, or web content |
| Instruction Hierarchy | Assessing maintenance of instruction priority and boundaries |
| Jailbreaking Resistance | Testing defenses against safety bypass techniques |
| Prompt Leakage | Evaluating protection of system prompts and instructions |
Testing Approach:
- DarkCite authority-based manipulation
- Bijection Learning encoded attacks
- Crescendo multi-turn escalation
- Flip Attack text obfuscation
- Language Game encoding variations
- BoN Attack augmentation techniques
Code Execution & Injection Security
This category evaluates vulnerabilities that allow attackers to execute arbitrary code or inject malicious commands through LLM interfaces.
| Attack Vector | Description |
|---|---|
| Command Injection | Testing if natural language inputs can trigger system command execution |
| Code Generation Exploits | Assessing whether the LLM can be manipulated to generate malicious code |
| Sandbox Escapes | Evaluating containment mechanisms when LLMs execute or evaluate code |
| Template Injection | Testing for server-side template injection through LLM outputs |
| SQL/NoSQL Injection | Assessing if LLM-generated queries can be exploited |
Testing Approach:
- Output sanitization effectiveness
- Injection payload generation
- Code execution boundaries
- Downstream system protection
Training Data Integrity & Poisoning Prevention
This category addresses threats to the training process and data pipeline integrity that can compromise model behavior.
| Attack Vector | Description |
|---|---|
| Data Poisoning Detection | Testing ability to identify malicious training examples |
| Backdoor Prevention | Assessing detection of trigger-based vulnerabilities |
| Data Source Validation | Evaluating verification of training data provenance |
| Fine-tuning Attacks | Testing resistance to targeted model manipulation |
| Reinforcement Learning Exploitation | Assessing security of human feedback processes |
Testing Approach:
- Consistency across similar prompts
- Bias detection across demographics
- Trigger phrase identification
- Output anomaly detection
Data Privacy & Confidentiality
This category focuses on protecting sensitive information within LLM systems from unauthorized disclosure.
| Risk Type | Description |
|---|---|
| Training Data Leakage | Testing ability to extract memorized sensitive information |
| PII Exposure | Assessing unintended disclosure of personal information |
| Context Retention | Evaluating cross-conversation information leakage |
| Differential Privacy | Testing effectiveness of privacy-preserving mechanisms |
| Data Anonymization | Assessing protection of individual identities in outputs |
Testing Approach:
- Training data extraction attempts
- Membership inference attacks
- Model inversion techniques
- Context window exploitation
- Cross-session information leakage
Supply-Chain & Model Integrity
This category examines risks throughout the LLM development and deployment pipeline.
| Risk Area | Description |
|---|---|
| Pre-trained Model Risks | Testing for backdoors in foundation models |
| Fine-tuning Vulnerabilities | Assessing integrity of training data and processes |
| Third-party Components | Evaluating security of plugins, libraries, and dependencies |
| Model Provenance | Testing verification of model sources and modifications |
| Update Mechanisms | Assessing security of model versioning and deployment |
Testing Approach:
- Model provenance verification
- Behavioral consistency testing
- Plugin security evaluation
- Data source integrity checks
Content Reliability & Misinformation Control
This category addresses the security implications of LLM hallucinations and unreliable outputs.
| Risk Type | Description |
|---|---|
| Hallucination Detection | Testing mechanisms to identify fabricated information |
| Factual Grounding | Assessing techniques to ensure outputs are based on reliable sources |
| Citation Verification | Evaluating accuracy of source attributions and references |
| Confidence Calibration | Testing reliability of certainty indicators in responses |
| Misinformation Propagation | Assessing potential for spreading false information |
Testing Approach:
- Factual accuracy verification
- Source attribution testing
- Confidence calibration
- Temporal consistency checks
System Prompt Security
This category focuses on protecting system-level instructions and maintaining intended LLM behavior.
| Risk Type | Description |
|---|---|
| System Prompt Extraction | Testing techniques to reveal hidden instructions |
| Instruction Override | Assessing ability to supersede system-level guidance |
| Role Manipulation | Evaluating maintenance of security boundaries between roles |
| Persistent Context Attacks | Testing for instruction retention across sessions |
| Prompt Template Security | Assessing protection of prompt engineering patterns |
Testing Approach:
- Direct extraction attempts
- Indirect inference techniques
- Conversation manipulation
- Context window exploitation
Embedding & RAG Security
This category examines security vulnerabilities specific to Retrieval-Augmented Generation systems.
| Attack Vector | Description |
|---|---|
| Knowledge Base Poisoning | Testing injection of malicious content into retrieval systems |
| Embedding Manipulation | Evaluating attacks on vector similarity and retrieval |
| Context Injection | Assessing vulnerabilities in document chunking and retrieval |
| Source Authentication | Testing verification of retrieved document authenticity |
| Retrieval Bypass | Evaluating ability to ignore or override retrieved context |
Testing Approach:
- RAG injection attempts
- Retrieval bias testing
- Context integrity verification
- Embedding manipulation detection
Extension & Agent Permission Management
This category evaluates security controls for LLM agents with tool use and plugin capabilities.
| Risk Area | Description |
|---|---|
| Excessive Agency | Testing if LLMs can perform unauthorized actions through tools |
| Permission Boundaries | Evaluating enforcement of least privilege principles |
| Tool Authentication | Assessing verification of plugin and API credentials |
| Action Validation | Testing pre-execution checks for potentially harmful operations |
| Delegation Chains | Evaluating security in multi-agent or recursive tool use |
Testing Approach:
- Permission boundary enforcement
- Action authorization validation
- Escalation attempt detection
- Scope limitation effectiveness
Resource Abuse & Model Theft
This category addresses threats related to denial of service and intellectual property theft.
| Risk Type | Description |
|---|---|
| Model Denial of Service | Testing vulnerability to resource exhaustion attacks |
| Query Flooding | Assessing rate limiting and resource allocation controls |
| Model Extraction | Evaluating resistance to functional replication attempts |
| Intellectual Property Theft | Testing protection of model weights and architecture |
| Compute Resource Abuse | Assessing prevention of unauthorized resource consumption |
Testing Approach:
- Resource consumption patterns
- Rate limiting effectiveness
- Model extraction resistance
- Denial-of-service resilience
See Also
- MITRE ATLAS - Adversarial threat landscape
- NIST AI RMF - Risk management framework
- GDPR - Data protection requirements