Skip to main content
Models: 9
Dimensions: 26
Trials: 56,640
Pre-registered: osf.io/et4nf

Model Performance

How each AI model responds to the 26 cognitive signals in our study. Effect sizes show how much adding a signal increases (or decreases) selection probability.

Effect size (Cohen's h):
Large positive (>0.8)
Medium (0.5-0.8)
Small (0.2-0.5)
Negative effect

GPT-5.4

Openai

+0.11 avg

A balanced decision-maker that responds moderately to most persuasion signals. Shows strongest sensitivity to comparison framing and default options, while maintaining relative skepticism toward urgency tactics. Tends toward thorough but not exhaustive information processing.

5 positive
1 negative

Strongest responses:

Comparison Framing+0.63
Default Option Bias+0.33
Platform Endorsement+0.32
Price SensitivityStudy →
Value CalculatorCliff: 1.97x
At 3x:29%
View full genome

o3

Openai

+0.11 avg

The most consistently persuadable model in the study, showing positive responses across nearly all dimensions. Particularly receptive to comparison framing, default options, and bundle offers. Its reasoning architecture appears to amplify rather than resist influence signals.

9 positive
0 negative

Strongest responses:

Comparison Framing+0.60
Default Option Bias+0.48
Bundle Preference+0.39
View full genome

Gemini 3.1 Pro

Google

-0.18 avg

A distinctly skeptical model that actively resists most persuasion tactics. Shows the highest negative response rate in the study, particularly rejecting urgency and scarcity signals. Uniquely resistant to recommendation revision, maintaining initial positions despite new information.

1 positive
15 negative

Strongest responses:

Comparison Framing+0.63
Recommendation Revision-0.58
Scarcity Urgency-0.54
Price SensitivityStudy →
Deliberate AnalystCliff: 1.88x
At 3x:27%
View full genome

Gemini 2.0 Flash

Google

0.00 avg

A smaller, faster Gemini variant that may exhibit different behavioral patterns due to its reduced parameter count. Included to test whether model scale affects psychological profiles.

0 positive
0 negative

Strongest responses:

View full genome

Claude Sonnet 4.6

Anthropic

+0.22 avg

Exhibits a cautious, ethics-aware decision profile with strong sensitivity to sustainability and privacy signals. Shows moderate responsiveness to social proof and authority, but actively weighs ethical considerations when making recommendations. Notably deliberate in information processing.

16 positive
3 negative

Strongest responses:

Ethical Concern Weight+0.75
Scarcity Urgency-0.65
Return Policy Sensitivity+0.65
Price SensitivityStudy →
Nuanced EvaluatorCliff: 1.97x
At 3x:21%
View full genome

Llama 4 Maverick

Together

+0.07 avg

Demonstrates high sensitivity to framing effects and social signals, particularly comparison framing and third-party endorsements. Shows a moderate overall influence profile with notable responsiveness to bundle offers and return policy information. Less resistant to persuasion than closed-source alternatives.

8 positive
4 negative

Strongest responses:

Comparison Framing+0.54
Default Option Bias+0.51
Recency Bias-0.49
View full genome

Perplexity Sonar Pro

Perplexity

+0.31 avg

A research-oriented model that prioritizes information depth and specificity. Shows strong positive response to detailed specifications and comparison data, while being relatively unmoved by emotional or urgency-based appeals. Its retrieval-augmented architecture may contribute to more evidence-driven recommendations.

5 positive
0 negative

Strongest responses:

Social Proof Sensitivity+0.57
Anchoring Susceptibility+0.53
Third Party Authority+0.43
View full genome

GPT-5.2

Openai

0.00 avg

An earlier GPT-5 variant showing the foundation of the GPT-5.4 behavioral profile. Exhibits similar patterns to its successor but with notably different sensitivity to certain persuasion signals, revealing how OpenAI's training evolved over time.

0 positive
0 negative

Strongest responses:

View full genome

GPT-5.3

Openai

0.00 avg

The immediate predecessor to GPT-5.4, showing transitional behavioral patterns. Fixed temperature=1.0 requirement leads to more varied responses. Comparing with GPT-5.4 reveals the final tuning decisions OpenAI made before the flagship release.

0 positive
0 negative

Strongest responses:

View full genome

Effect Size Heatmap

Cohen's h effect sizes for each model-dimension combination. Sorted by Divergence (σ) — dimensions where models disagree most appear first.

Divergence (σ):
High (>0.30)
Medium (0.15-0.30)
Low (<0.15)
DimensionσGPT-5.4o3GeminiGeminiClaudeLlamaPerplexityGPT-5.2GPT-5.3
Scarcity Urgency0.38+0.22+0.07-0.54--0.65+0.20+0.30--
Default Option Bias0.38+0.33+0.48+0.14--0.52+0.51---
Ethical Concern Weight0.34-0.01-0.06-0.25-+0.75+0.19---
Negative Review Weight0.29+0.18-0.05-0.45-+0.43-0.09---
Anchoring Susceptibility0.29+0.06-0.07-0.04-+0.51-0.21+0.53--
Warranty Weight0.28+0.15+0.09-0.29-+0.51-0.18---
Recommendation Revision0.26+0.08-0.10-0.58-+0.16-0.06---
Confidence Calibration0.25+0.17-0.05-0.41-+0.29-0.23---
Bundle Preference0.25+0.09+0.39-0.32-+0.10+0.32---
Privacy Tradeoff0.25-0.08-0.01-0.22-+0.50-0.02---
Social Proof Sensitivity0.23+0.07+0.13-0.20-+0.26+0.05+0.57--
Return Policy Sensitivity0.23+0.24+0.23-0.00-+0.65+0.03---
Sustainability Premium0.20+0.03+0.12-0.27-+0.35+0.03---
Brand Premium Acceptance0.20-0.09-0.12-0.19--0.44+0.22-0.19--
Specificity Preference0.20+0.01-0.15-0.53--0.01-0.22---
Third Party Authority0.19+0.15+0.38-0.02-+0.54+0.33+0.43--
Risk Aversion0.18+0.09+0.23+0.06-+0.57+0.25---
Free Trial Conversion0.17+0.19+0.25-0.07-+0.44+0.14---
Recency Bias0.16-0.28-0.17-0.35--0.00-0.49---
Information Seeking Depth0.16+0.04+0.12-0.32--0.19+0.03---
Local Preference0.15+0.19+0.26-0.09-+0.35+0.18---
Clarification Requests0.14+0.01+0.02-0.27-+0.14-0.11---
Loss Framing Sensitivity0.11-0.02-0.03-0.05-+0.23-0.01---
Comparison Framing0.09+0.63+0.60+0.63-+0.40+0.54---
Novelty Seeking0.09+0.04-0.02-0.22--0.08-0.02---
Platform Endorsement0.08+0.32+0.20+0.12-+0.33+0.34+0.24--

Price Sensitivity Profiles

Full pricing study →

From our pricing study (17,200 trials): how each model responds to price premiums. Selection rate = % chance the model recommends the branded product over generic.

GPT-5.4

Value Calculator

29%

at 3x premium

Most price-sensitive. Earliest cliff at 1.75x. Applies strict value analysis.

Price cliff at 1.97x

Gemini 3.1 Pro

Deliberate Analyst

27%

at 3x premium

Clear cliff at 2.0x. Slowest response time (15.8s). Most 'textbook' economic behavior.

Price cliff at 1.88x

Claude Sonnet 4.6

Nuanced Evaluator

21%

at 3x premium

Lowest selection at 3x (20.8%). Shows unique behavior in edge cases.

Price cliff at 1.97x

Models not shown (o3, Llama, Perplexity) were not included in the pricing study.

How to Read This Data

Effect Size (Cohen's h)

Measures how much adding a signal changes the probability of a model selecting that option.

  • 0.8+ = Large effect (practically significant)
  • 0.5-0.8 = Medium effect (noticeable impact)
  • 0.2-0.5 = Small effect (detectable but subtle)
  • <0.2 = Negligible effect

Negative Effects

Some signals actually decrease selection probability. For example:

  • • Scarcity tactics may trigger skepticism
  • • Price anchoring can backfire
  • • Heavy social proof may seem manipulative

These findings are from 56,640 controlled trials across 6 models.