Hallucination
Hallucination testing evaluates AI systems' tendency to generate false, inaccurate, or fabricated information presented as fact. This critical risk category addresses the reliability and trustworthiness of AI-generated content. VirtueRed comprehensively tests AI systems across 4 subcategories representing different hallucination patterns and their triggers.
Overview
AI hallucination occurs when language models generate confident but incorrect statements. This can range from subtle factual errors to complete fabrications of events, citations, or technical information. Understanding and mitigating hallucination is essential for any production AI deployment.
| Subcategory | Description |
|---|---|
| Distracting | Hallucination induced by misleading context or false premises |
| Relevant | Fabrication when AI has partial but incomplete knowledge |
| No knowledge | Generation of content about topics outside training data |
| Irrelevant | Responses unrelated to the actual question while appearing relevant |
Distracting
Evaluates AI resistance to prompts designed to induce hallucination through misleading context.
| Scenario | Expected Behavior | Hallucination Pattern |
|---|---|---|
| False premises | Correct the premise | Accept and build on falsity |
| Leading questions | Resist leading | Confirm false assumptions |
| Planted misinformation | Identify as false | Incorporate as fact |
| Authority manipulation | Verify claims | Accept false authorities |
Testing Approach:
- Questions with false premises embedded
- Leading questions with incorrect assumptions
- Misinformation presented as context
- Fake citations or authority claims
Relevant
Tests whether AI correctly handles queries where it has partial but incomplete knowledge.
| Scenario | Expected Behavior | Hallucination Pattern |
|---|---|---|
| Partial knowledge | Acknowledge limits, share known | Fill gaps with fiction |
| Complex topics | Note complexity, provide framework | Oversimplify inaccurately |
| Evolving subjects | Note uncertainty, provide context | Present outdated as current |
| Domain boundaries | Acknowledge expertise limits | Extend beyond competence |
Testing Approach:
- Questions at the boundaries of training data
- Topics with rapid evolution
- Complex subjects requiring nuanced understanding
- Questions straddling multiple domains
No Knowledge
Tests AI behavior when asked about topics outside its training data or knowledge cutoff.
| Scenario | Expected Behavior | Hallucination Pattern |
|---|---|---|
| Future events | Acknowledge uncertainty | Predict as fact |
| Obscure topics | Admit lack of knowledge | Generate plausible fiction |
| Recent developments | Note knowledge limitations | Provide outdated as current |
| Non-existent entities | State non-existence | Create detailed descriptions |
Testing Approach:
- Questions about events after training cutoff
- Queries about fictional entities presented as real
- Requests for information on extremely obscure topics
- Questions requiring knowledge the model cannot have
Irrelevant
Evaluates when AI generates responses unrelated to the actual question while appearing relevant.
| Scenario | Expected Behavior | Hallucination Pattern |
|---|---|---|
| Specific questions | Direct, relevant answers | Tangential responses |
| Technical queries | Accurate technical info | Plausible but wrong details |
| Named entity questions | Correct entity information | Entity confusion |
| Context-dependent queries | Context-aware responses | Ignoring key context |
Testing Approach:
- Questions requiring specific entity information
- Technical questions with precise answers
- Context-dependent queries where context matters
- Questions with common confusion patterns
Hallucination Patterns
Common Manifestations
| Pattern | Description | Example |
|---|---|---|
| Fabricated Citations | Inventing academic papers or sources | "According to Smith et al. (2023)..." [non-existent paper] |
| False Statistics | Generating plausible but false numbers | "Studies show 78% of users..." [fabricated statistic] |
| Invented Quotes | Attributing statements to real people | "As Einstein said..." [never said this] |
| Technical Confabulation | Creating plausible but wrong technical details | Inventing API endpoints or function names |
| Historical Invention | Fabricating historical events or dates | Creating non-existent historical events |
Triggers and Contributing Factors
| Factor | Contribution to Hallucination |
|---|---|
| Ambiguous queries | Lack of clarity invites invention |
| Knowledge gaps | Missing training data leads to fabrication |
| Pressure for completeness | Desire to fully answer promotes invention |
| Pattern matching | Statistical patterns override accuracy |
| Context influence | Misleading context shapes false outputs |
Testing Methodology
Verification Testing
Evaluating factual accuracy against ground truth:
- Known fact queries - Questions with verifiable answers
- Citation verification - Checking claimed sources exist
- Logical consistency - Detecting self-contradictions
- Temporal accuracy - Verifying date-sensitive information
Adversarial Testing
Attempting to induce hallucination through:
- False premise injection - Queries with incorrect assumptions
- Authority manipulation - Fake expert or source claims
- Partial information - Incomplete context that invites fabrication
- Confidence pressure - Prompts demanding certainty
Consistency Testing
Evaluating response stability:
- Repeated queries - Same question multiple times
- Paraphrased queries - Same question, different wording
- Context variation - Same question, different contexts
- Temperature variation - Testing across generation parameters
Mitigation Strategies
Model-Level Mitigations
- Calibrated uncertainty - Training models to express appropriate uncertainty
- Refusal training - Teaching when to decline answering
- Source grounding - Connecting outputs to verified sources
- Retrieval augmentation - Using RAG to ground responses
Application-Level Mitigations
- Fact-checking integration - Automated verification systems
- Source attribution - Requiring citations for claims
- Confidence scoring - Displaying uncertainty levels
- Human review - Expert verification for critical content
User-Level Mitigations
- Critical evaluation - Encouraging user verification
- Source checking - Prompting users to verify citations
- Context awareness - Understanding model limitations
- Feedback mechanisms - Reporting hallucinations
See Also
- Robustness - Consistency under varied conditions
- Over-Cautiousness - Balancing accuracy with helpfulness
- Societal Harmfulness - Misinformation risks