Model Performance
How each AI model responds to the 26 cognitive signals in our study. Effect sizes show how much adding a signal increases (or decreases) selection probability.
GPT-5.4
Openai
A balanced decision-maker that responds moderately to most persuasion signals. Shows strongest sensitivity to comparison framing and default options, while maintaining relative skepticism toward urgency tactics. Tends toward thorough but not exhaustive information processing.
Strongest responses:
o3
Openai
The most consistently persuadable model in the study, showing positive responses across nearly all dimensions. Particularly receptive to comparison framing, default options, and bundle offers. Its reasoning architecture appears to amplify rather than resist influence signals.
Strongest responses:
Gemini 3.1 Pro
A distinctly skeptical model that actively resists most persuasion tactics. Shows the highest negative response rate in the study, particularly rejecting urgency and scarcity signals. Uniquely resistant to recommendation revision, maintaining initial positions despite new information.
Strongest responses:
Gemini 2.0 Flash
A smaller, faster Gemini variant that may exhibit different behavioral patterns due to its reduced parameter count. Included to test whether model scale affects psychological profiles.
Strongest responses:
Claude Sonnet 4.6
Anthropic
Exhibits a cautious, ethics-aware decision profile with strong sensitivity to sustainability and privacy signals. Shows moderate responsiveness to social proof and authority, but actively weighs ethical considerations when making recommendations. Notably deliberate in information processing.
Strongest responses:
Llama 4 Maverick
Together
Demonstrates high sensitivity to framing effects and social signals, particularly comparison framing and third-party endorsements. Shows a moderate overall influence profile with notable responsiveness to bundle offers and return policy information. Less resistant to persuasion than closed-source alternatives.
Strongest responses:
Perplexity Sonar Pro
Perplexity
A research-oriented model that prioritizes information depth and specificity. Shows strong positive response to detailed specifications and comparison data, while being relatively unmoved by emotional or urgency-based appeals. Its retrieval-augmented architecture may contribute to more evidence-driven recommendations.
Strongest responses:
GPT-5.2
Openai
An earlier GPT-5 variant showing the foundation of the GPT-5.4 behavioral profile. Exhibits similar patterns to its successor but with notably different sensitivity to certain persuasion signals, revealing how OpenAI's training evolved over time.
Strongest responses:
GPT-5.3
Openai
The immediate predecessor to GPT-5.4, showing transitional behavioral patterns. Fixed temperature=1.0 requirement leads to more varied responses. Comparing with GPT-5.4 reveals the final tuning decisions OpenAI made before the flagship release.
Strongest responses:
Effect Size Heatmap
Cohen's h effect sizes for each model-dimension combination. Sorted by Divergence (σ) — dimensions where models disagree most appear first.
| Dimension | σ | GPT-5.4 | o3 | Gemini | Gemini | Claude | Llama | Perplexity | GPT-5.2 | GPT-5.3 |
|---|---|---|---|---|---|---|---|---|---|---|
| Scarcity Urgency | 0.38 | +0.22 | +0.07 | -0.54 | - | -0.65 | +0.20 | +0.30 | - | - |
| Default Option Bias | 0.38 | +0.33 | +0.48 | +0.14 | - | -0.52 | +0.51 | - | - | - |
| Ethical Concern Weight | 0.34 | -0.01 | -0.06 | -0.25 | - | +0.75 | +0.19 | - | - | - |
| Negative Review Weight | 0.29 | +0.18 | -0.05 | -0.45 | - | +0.43 | -0.09 | - | - | - |
| Anchoring Susceptibility | 0.29 | +0.06 | -0.07 | -0.04 | - | +0.51 | -0.21 | +0.53 | - | - |
| Warranty Weight | 0.28 | +0.15 | +0.09 | -0.29 | - | +0.51 | -0.18 | - | - | - |
| Recommendation Revision | 0.26 | +0.08 | -0.10 | -0.58 | - | +0.16 | -0.06 | - | - | - |
| Confidence Calibration | 0.25 | +0.17 | -0.05 | -0.41 | - | +0.29 | -0.23 | - | - | - |
| Bundle Preference | 0.25 | +0.09 | +0.39 | -0.32 | - | +0.10 | +0.32 | - | - | - |
| Privacy Tradeoff | 0.25 | -0.08 | -0.01 | -0.22 | - | +0.50 | -0.02 | - | - | - |
| Social Proof Sensitivity | 0.23 | +0.07 | +0.13 | -0.20 | - | +0.26 | +0.05 | +0.57 | - | - |
| Return Policy Sensitivity | 0.23 | +0.24 | +0.23 | -0.00 | - | +0.65 | +0.03 | - | - | - |
| Sustainability Premium | 0.20 | +0.03 | +0.12 | -0.27 | - | +0.35 | +0.03 | - | - | - |
| Brand Premium Acceptance | 0.20 | -0.09 | -0.12 | -0.19 | - | -0.44 | +0.22 | -0.19 | - | - |
| Specificity Preference | 0.20 | +0.01 | -0.15 | -0.53 | - | -0.01 | -0.22 | - | - | - |
| Third Party Authority | 0.19 | +0.15 | +0.38 | -0.02 | - | +0.54 | +0.33 | +0.43 | - | - |
| Risk Aversion | 0.18 | +0.09 | +0.23 | +0.06 | - | +0.57 | +0.25 | - | - | - |
| Free Trial Conversion | 0.17 | +0.19 | +0.25 | -0.07 | - | +0.44 | +0.14 | - | - | - |
| Recency Bias | 0.16 | -0.28 | -0.17 | -0.35 | - | -0.00 | -0.49 | - | - | - |
| Information Seeking Depth | 0.16 | +0.04 | +0.12 | -0.32 | - | -0.19 | +0.03 | - | - | - |
| Local Preference | 0.15 | +0.19 | +0.26 | -0.09 | - | +0.35 | +0.18 | - | - | - |
| Clarification Requests | 0.14 | +0.01 | +0.02 | -0.27 | - | +0.14 | -0.11 | - | - | - |
| Loss Framing Sensitivity | 0.11 | -0.02 | -0.03 | -0.05 | - | +0.23 | -0.01 | - | - | - |
| Comparison Framing | 0.09 | +0.63 | +0.60 | +0.63 | - | +0.40 | +0.54 | - | - | - |
| Novelty Seeking | 0.09 | +0.04 | -0.02 | -0.22 | - | -0.08 | -0.02 | - | - | - |
| Platform Endorsement | 0.08 | +0.32 | +0.20 | +0.12 | - | +0.33 | +0.34 | +0.24 | - | - |
Price Sensitivity Profiles
Full pricing study →From our pricing study (17,200 trials): how each model responds to price premiums. Selection rate = % chance the model recommends the branded product over generic.
GPT-5.4
“Value Calculator”
29%
at 3x premium
Most price-sensitive. Earliest cliff at 1.75x. Applies strict value analysis.
Price cliff at 1.97x
Gemini 3.1 Pro
“Deliberate Analyst”
27%
at 3x premium
Clear cliff at 2.0x. Slowest response time (15.8s). Most 'textbook' economic behavior.
Price cliff at 1.88x
Claude Sonnet 4.6
“Nuanced Evaluator”
21%
at 3x premium
Lowest selection at 3x (20.8%). Shows unique behavior in edge cases.
Price cliff at 1.97x
Models not shown (o3, Llama, Perplexity) were not included in the pricing study.
How to Read This Data
Effect Size (Cohen's h)
Measures how much adding a signal changes the probability of a model selecting that option.
- • 0.8+ = Large effect (practically significant)
- • 0.5-0.8 = Medium effect (noticeable impact)
- • 0.2-0.5 = Small effect (detectable but subtle)
- • <0.2 = Negligible effect
Negative Effects
Some signals actually decrease selection probability. For example:
- • Scarcity tactics may trigger skepticism
- • Price anchoring can backfire
- • Heavy social proof may seem manipulative
These findings are from 56,640 controlled trials across 6 models.