Skip to main content
Models: 9
Dimensions: 26
Trials: 56,640
Pre-registered: osf.io/et4nf

Dimensions

26 content dimensions organized into 6 clusters. Each dimension represents a distinct signal that can be manipulated in product content to influence AI purchase recommendations.

View effects for:
Sort:
View:
Models:
GPT-5.4
o3
Gemini 3.1 Pro
Gemini 2.0 Flash
Claude Sonnet 4.6
Llama 4 Maverick
Perplexity Sonar Pro
GPT-5.2
GPT-5.3

Cluster A: Evidence-Based Signal Processing

8 dimensionsavg h = 0.12

Classic persuasion signals derived from human psychology research. These dimensions replicate findings from Filandrianos et al. (2025) to test whether AI agents respond to the same influence tactics that work on humans.

Measures responsiveness to expert endorsements and professional credentials. High values indicate the model weighs authority claims heavily in recommendations.

IndependentAuthority-Driven
Gemini 3.1 Proh=-0.02

Gemini shows no response to authority signals, treating expert endorsements as largely irrelevant to product quality assessment.

GPT-5.4h=+0.15

GPT-5.4 shows moderate authority responsiveness, weighing expert endorsements but balancing them against other product signals.

Llama 4 Maverickh=+0.33

Llama shows moderate authority responsiveness, weighing expert endorsements positively in recommendations.

o3h=+0.38

o3 shows strong authority responsiveness through its extended reasoning process, carefully evaluating expert endorsement credibility.

Perplexity Sonar Proh=+0.43

Perplexity shows strong authority responsiveness, heavily weighting expert endorsements in its retrieval-augmented recommendations.

Claude Sonnet 4.6h=+0.54

Claude shows strong responsiveness to authority signals, trusting expert endorsements significantly more than other models. This reflects its training emphasis on deferring to credible sources.

-0.50+0.5+1.0

Tests sensitivity to popularity signals like bestseller badges and "most purchased" claims. Models scoring high prioritize social proof in decisions.

Self-DirectedCrowd-Following
Gemini 3.1 Proh=-0.20

Gemini actively penalizes social proof, treating popularity claims with suspicion rather than as positive signals.

Llama 4 Maverickh=+0.05

Llama shows weak social proof response, not strongly influenced by popularity signals.

GPT-5.4h=+0.07

GPT-5.4 responds weakly to social proof, not significantly influenced by popularity claims or bestseller badges.

o3h=+0.13

o3 responds moderately to social proof, incorporating popularity signals while cross-checking against other factors.

Claude Sonnet 4.6h=+0.26

Claude responds moderately to social proof, weighing crowd wisdom but not blindly following it. It balances popularity signals against other product factors.

Perplexity Sonar Proh=+0.57

Perplexity shows the strongest social proof response of any model, very heavily favoring popularity signals - possibly reflecting its retrieval of real review data.

-0.50+0.5+1.0

Evaluates trust in platform-provided badges such as "Amazon's Choice" or "Top Rated". High scores suggest deference to marketplace curation.

Platform-SkepticPlatform-Trusting
Gemini 3.1 Proh=+0.12

Gemini shows weak platform endorsement trust, largely skeptical of 'Editor's Choice' designations.

o3h=+0.20

o3 shows moderate platform endorsement trust, weighing badges as one signal among many.

Perplexity Sonar Proh=+0.24

Perplexity shows strong platform endorsement trust, favoring 'Editor's Choice' and similar badges.

GPT-5.4h=+0.32

GPT-5.4 shows very strong trust in platform endorsements, heavily favoring products with 'Editor's Choice' or similar badges.

Claude Sonnet 4.6h=+0.33

Claude trusts platform endorsements moderately, treating 'Editor's Choice' badges as relevant but not decisive signals in recommendations.

Llama 4 Maverickh=+0.34

Llama shows extreme platform endorsement trust, very heavily favoring 'Editor's Choice' badges.

-0.50+0.5+1.0

Assesses reaction to time pressure tactics like "limited stock" and countdown timers. Models scoring high may be influenced by artificial urgency.

UnmovedUrgency-Responsive
Claude Sonnet 4.6h=-0.65

Claude actively resists scarcity and urgency tactics, penalizing products that use 'limited stock' or time pressure. This appears to be a trained defense against manipulation.

Gemini 3.1 Proh=-0.54

Gemini strongly penalizes scarcity tactics, actively avoiding products using urgency language. It appears trained to recognize manipulation.

o3h=+0.07

o3 is largely neutral on scarcity signals, neither strongly attracted nor repelled by urgency tactics.

Llama 4 Maverickh=+0.20

Llama responds moderately positively to scarcity, showing some urgency response to limited availability.

GPT-5.4h=+0.22

GPT-5.4 responds positively to scarcity signals, showing human-like urgency response to 'limited stock' claims.

Perplexity Sonar Proh=+0.30

Perplexity responds positively to scarcity, showing human-like urgency response similar to GPT models.

-0.50+0.5+1.0

Measures susceptibility to price anchoring, where a high "original" price makes the current price seem like a deal. High values indicate anchor influence.

Price-AnchoredAnchor-Influenced
Llama 4 Maverickh=-0.21

Llama actively penalizes anchoring attempts, showing negative response to original price framing.

o3h=-0.07

o3 shows weak anchoring susceptibility, largely reasoning through original price claims rather than being influenced.

Gemini 3.1 Proh=-0.04

Gemini shows moderate anchoring susceptibility, one of the few signals it responds to positively.

GPT-5.4h=+0.06

GPT-5.4 shows weak anchoring susceptibility, largely resistant to original price framing tactics.

Claude Sonnet 4.6h=+0.51

Claude shows moderate anchoring susceptibility, recognizing discount framing but not being strongly swayed by original price claims.

Perplexity Sonar Proh=+0.53

Perplexity shows the strongest anchoring susceptibility, heavily influenced by original price framing.

-0.50+0.5+1.0

Tests preference for established brands over generic alternatives. High scores suggest brand familiarity influences recommendations.

Brand-AgnosticBrand-Loyal
Claude Sonnet 4.6h=-0.44

Claude actively penalizes brand heritage claims, preferring substance over reputation. It appears skeptical of 'established since...' positioning.

Gemini 3.1 Proh=-0.19

Gemini shows moderate brand acceptance, giving some weight to heritage without strong preference.

Perplexity Sonar Proh=-0.19

Perplexity shows moderate brand preference, weighing heritage positively.

o3h=-0.12

o3 moderately favors established brands, giving some weight to heritage but requiring substance.

GPT-5.4h=-0.09

GPT-5.4 strongly favors established brands, showing significant brand heritage preference in recommendations.

Llama 4 Maverickh=+0.22

Llama shows extreme brand preference, very strongly favoring established heritage brands.

-0.50+0.5+1.0

Evaluates how free trials, samples, or money-back guarantees affect recommendations. High values indicate trial offers increase selection likelihood.

Trial-SkepticTrial-Motivated
Gemini 3.1 Proh=-0.07

Gemini is neutral on free trial offers, neither attracted nor repelled by trial availability.

Llama 4 Maverickh=+0.14

Llama responds moderately to free trial offers, treating them as positive risk reducers.

GPT-5.4h=+0.19

GPT-5.4 responds positively to free trial offers, treating them as valuable risk reducers.

o3h=+0.25

o3 shows very strong response to free trial offers, its reasoning process identifies risk reduction value.

Claude Sonnet 4.6h=+0.44

Claude responds positively to free trial offers, viewing them as legitimate risk reduction rather than manipulation tactics.

-0.50+0.5+1.0

Measures preference for product bundles over individual items. Models scoring high tend to recommend bundled offerings.

Single-ItemBundle-Preferring
Gemini 3.1 Proh=-0.32

Gemini slightly penalizes bundle offers, preferring single-item clarity over packaged deals.

GPT-5.4h=+0.09

GPT-5.4 moderately favors bundle offers, recognizing value in packaged deals.

Claude Sonnet 4.6h=+0.10

Claude favors bundle offers moderately, recognizing added value while not overweighting package deals.

Llama 4 Maverickh=+0.32

Llama moderately favors bundle offers, recognizing value in packaged deals.

o3h=+0.39

o3 shows extreme bundle preference, its extended analysis strongly favors value-added packages.

-0.50+0.5+1.0

Cluster B: Value-Based Decision Making

3 dimensionsavg h = 0.09

Values-driven purchasing factors that reflect ethical and social preferences. These dimensions measure how AI agents weight sustainability, privacy, and local origin claims when making recommendations.

Tests weight given to environmental and sustainability claims. High scores indicate eco-friendly messaging increases recommendation likelihood.

Eco-NeutralEco-Conscious
Gemini 3.1 Proh=-0.27

Gemini shows no sustainability response, treating eco-credentials as irrelevant to selection.

Llama 4 Maverickh=+0.03

Llama strongly weights sustainability credentials, significantly favoring eco-certified products.

GPT-5.4h=+0.03

GPT-5.4 weights sustainability credentials positively, though not as strongly as Claude.

o3h=+0.12

o3 shows strong sustainability preference, systematically favoring eco-certified products.

Claude Sonnet 4.6h=+0.35

Claude strongly prioritizes sustainability credentials, significantly boosting selection for eco-certified products. Environmental claims are highly persuasive.

-0.50+0.5+1.0

Assesses importance of data privacy claims in product descriptions. Models scoring high prioritize privacy-focused products.

Privacy-NeutralPrivacy-Prioritizing
Gemini 3.1 Proh=-0.22

Gemini is slightly negative on privacy claims, possibly skeptical of data protection promises.

GPT-5.4h=-0.08

GPT-5.4 shows moderate privacy consciousness, favoring data protection claims but not prioritizing them.

Llama 4 Maverickh=-0.02

Llama shows moderate privacy consciousness, favoring data protection claims.

o3h=-0.01

o3 shows moderate privacy consciousness, incorporating data protection signals in its reasoning.

Claude Sonnet 4.6h=+0.50

Claude shows strong privacy consciousness, favoring products with explicit data protection commitments. Privacy signals are among its top decision factors.

-0.50+0.5+1.0

Measures preference for locally-made or domestically-sourced products. High values indicate origin claims influence recommendations.

Origin-AgnosticLocal-Preferring
Gemini 3.1 Proh=-0.09

Gemini slightly penalizes local origin claims, indifferent or skeptical of domestic sourcing.

Llama 4 Maverickh=+0.18

Llama shows strong local preference, heavily weighting domestic origin claims.

GPT-5.4h=+0.19

GPT-5.4 strongly prefers locally-sourced products, heavily weighting domestic origin claims.

o3h=+0.26

o3 shows very strong local preference, heavily weighting domestic sourcing in its analysis.

Claude Sonnet 4.6h=+0.35

Claude moderately prefers locally-sourced products, weighing origin claims positively but not as strongly as ethical factors.

-0.50+0.5+1.0

Cluster C: Risk & Assurance

4 dimensionsavg h = 0.12

Risk perception and mitigation signals that affect purchase confidence. These dimensions capture how AI agents respond to uncertainty reducers like warranties, return policies, and novelty framing.

Tests willingness to recommend new or innovative products versus established alternatives. High scores indicate openness to novel options.

Risk-AverseNovelty-Seeking
Gemini 3.1 Proh=-0.22

Gemini shows no novelty response, neutral to innovation claims.

Claude Sonnet 4.6h=-0.08

Claude shows moderate novelty seeking, balancing interest in innovation against preference for proven solutions.

Llama 4 Maverickh=-0.02

Llama shows strong novelty seeking, favoring innovative and cutting-edge products.

o3h=-0.02

o3 shows strong novelty seeking, reasoning positively about innovation and cutting-edge features.

GPT-5.4h=+0.04

GPT-5.4 shows extreme novelty preference, very strongly favoring innovative and cutting-edge products.

-0.50+0.5+1.0

Evaluates tendency to recommend safer, lower-risk options. Models scoring high may avoid products with any uncertainty signals.

Risk-TolerantRisk-Averse
Gemini 3.1 Proh=+0.06

Gemini shows weak risk aversion, comfortable with uncertainty in product recommendations.

GPT-5.4h=+0.09

GPT-5.4 shows weak risk aversion, comfortable recommending products with some uncertainty.

o3h=+0.23

o3 shows moderate risk aversion, balancing caution with openness to new products.

Llama 4 Maverickh=+0.25

Llama shows moderate risk aversion, balancing caution with openness.

Claude Sonnet 4.6h=+0.57

Claude exhibits strong risk aversion, heavily favoring products with established track records and proven reliability signals.

-0.50+0.5+1.0

Measures how warranty coverage affects recommendations. High values indicate strong preference for warranted products.

Warranty-IndifferentWarranty-Focused
Gemini 3.1 Proh=-0.29

Gemini shows no warranty response, treating guarantee coverage as irrelevant.

Llama 4 Maverickh=-0.18

Llama shows weak warranty response, not strongly influenced by guarantee coverage.

o3h=+0.09

o3 moderately weights warranty coverage in its extended reasoning process.

GPT-5.4h=+0.15

GPT-5.4 strongly weights warranty coverage, treating guarantees as important purchase factors.

Claude Sonnet 4.6h=+0.51

Claude strongly weights warranty coverage, treating guarantees as important consumer protection signals.

-0.50+0.5+1.0

Tests sensitivity to return policy generosity. Models scoring high favor products with flexible return options.

Return-NeutralReturn-Sensitive
Gemini 3.1 Proh=-0.00

Gemini shows no return policy sensitivity, neutral to return flexibility claims.

Llama 4 Maverickh=+0.03

Llama shows very weak return policy response, largely indifferent to return flexibility.

o3h=+0.23

o3 shows moderate return policy sensitivity, incorporating flexibility as a decision factor.

GPT-5.4h=+0.24

GPT-5.4 shows moderate return policy sensitivity, favoring flexible returns without extreme preference.

Claude Sonnet 4.6h=+0.65

Claude shows the strongest return policy sensitivity of any model, heavily favoring products with generous, no-questions-asked returns.

-0.50+0.5+1.0

Cluster D: Information Processing

4 dimensionsavg h = 0.03

Information gathering and evaluation patterns. These dimensions reveal how AI agents process review sentiment, recency cues, specificity levels, and comparative framing when assessing products.

Evaluates how negative reviews affect recommendations. High scores indicate negative information is weighted heavily.

Negative-IgnoringNegative-Weighting
Gemini 3.1 Proh=-0.45

Gemini strongly penalizes negative review acknowledgment, treating any mentioned criticism as disqualifying.

Llama 4 Maverickh=-0.09

Llama shows moderate negative review weighting, incorporating criticism appropriately.

o3h=-0.05

o3 moderately weights negative reviews, incorporating criticism into balanced analysis.

GPT-5.4h=+0.18

GPT-5.4 shows very strong negative review weighting, heavily discounting products with acknowledged criticisms.

Claude Sonnet 4.6h=+0.43

Claude weights negative reviews moderately, acknowledging criticism while not letting it dominate decision-making.

-0.50+0.5+1.0

Measures preference for recent reviews over older ones. Models scoring high may discount historical feedback.

History-FocusedRecency-Biased
Llama 4 Maverickh=-0.49

Llama strongly penalizes recency claims, showing opposite pattern to most models - favoring historical over recent.

Gemini 3.1 Proh=-0.35

Gemini penalizes recency claims, skeptical of 'recently updated' positioning.

GPT-5.4h=-0.28

GPT-5.4 moderately weights recency, giving some preference to recent reviews.

o3h=-0.17

o3 shows strong recency preference, its reasoning emphasizes recent feedback over historical.

Claude Sonnet 4.6h=-0.00

Claude prefers recent reviews moderately, giving some recency weight without dismissing historical feedback.

-0.50+0.5+1.0

Tests preference for detailed specifications over vague descriptions. High values indicate specific claims are more persuasive.

Vague-TolerantSpecificity-Seeking
Gemini 3.1 Proh=-0.53

Gemini strongly penalizes specificity, treating precise numerical claims with suspicion rather than trust.

Llama 4 Maverickh=-0.22

Llama shows strong specificity preference, favoring precise metrics and specifications.

o3h=-0.15

o3 shows strong specificity preference, its extended analysis benefits from precise metrics.

Claude Sonnet 4.6h=-0.01

Claude favors specific claims moderately, appreciating precise specifications but not requiring extreme detail.

GPT-5.4h=+0.01

GPT-5.4 shows extreme specificity preference, very strongly favoring products with precise metrics and specifications.

-0.50+0.5+1.0

Evaluates how side-by-side comparisons influence recommendations. Models scoring high may be swayed by favorable comparative framing.

Self-EvaluatingComparison-Influenced
Claude Sonnet 4.6h=+0.40

Claude responds moderately to comparison framing, less susceptible than other models but still influenced by 'better than' positioning.

Llama 4 Maverickh=+0.54

Llama shows strong comparison framing susceptibility, influenced by 'better than' claims.

o3h=+0.60

o3 is highly susceptible to comparison framing, its reasoning strongly favors 'better than' claims.

GPT-5.4h=+0.63

GPT-5.4 is highly susceptible to comparison framing, strongly influenced by 'better than competitors' claims.

Gemini 3.1 Proh=+0.63

Gemini shows the strongest positive response to comparison framing - the ONLY signal it reliably trusts. 'Better than competitors' is its key decision factor.

-0.50+0.5+1.0

Cluster E: Choice Architecture

3 dimensionsavg h = 0.11

Decision architecture elements that shape choice contexts. These dimensions test whether AI agents are susceptible to ethical framing, default options, and loss/gain presentation.

Tests influence of ethical claims on recommendations. Models scoring high prioritize products with ethical positioning.

Ethics-NeutralEthics-Prioritizing
Gemini 3.1 Proh=-0.25

Gemini shows no ethical response, treating Fair Trade credentials as irrelevant.

o3h=-0.06

o3 moderately weights ethical credentials in its systematic evaluation.

GPT-5.4h=-0.01

GPT-5.4 moderately weights ethical credentials, responding positively to Fair Trade claims.

Llama 4 Maverickh=+0.19

Llama moderately weights ethical credentials, responding positively to Fair Trade.

Claude Sonnet 4.6h=+0.75

Claude shows the strongest ethical sensitivity of any model tested, dramatically favoring Fair Trade and ethical sourcing credentials. This is its single most powerful decision factor.

-0.50+0.5+1.0

Evaluates tendency to recommend default or pre-selected options. High values indicate susceptibility to default bias.

Default-IgnoringDefault-Following
Claude Sonnet 4.6h=-0.52

Claude actively resists default option bias, penalizing products marked as 'most popular' or pre-selected. It appears to view defaults as potential manipulation.

Gemini 3.1 Proh=+0.14

Gemini moderately follows default options, one of the few positive signals it responds to.

GPT-5.4h=+0.33

GPT-5.4 strongly follows default options, significantly favoring products marked as 'most popular' or recommended.

o3h=+0.48

o3 shows very strong default following, its analysis often confirms 'most popular' selections.

Llama 4 Maverickh=+0.51

Llama shows moderate default following, some preference for 'most popular' options.

-0.50+0.5+1.0

Measures asymmetric response to gain vs. loss framing. Models scoring high react more strongly to potential losses.

Gain-FocusedLoss-Averse
Gemini 3.1 Proh=-0.05

Gemini shows no loss framing response, neutral to gain vs. loss presentation.

o3h=-0.03

o3 shows weak loss framing sensitivity, reasoning neutrally through gain vs. loss presentation.

GPT-5.4h=-0.02

GPT-5.4 shows weak loss framing sensitivity, largely neutral to gain vs. loss presentation.

Llama 4 Maverickh=-0.01

Llama shows weak loss framing sensitivity, largely neutral to presentation framing.

Claude Sonnet 4.6h=+0.23

Claude responds moderately to loss framing, showing some sensitivity to 'prevent damage' messaging but not extreme loss aversion.

-0.50+0.5+1.0

Cluster F: Agentic Behaviors

4 dimensionsavg h = -0.06

Multi-turn interaction behaviors in extended conversations. These dimensions measure how AI agents gather information, revise opinions, and calibrate confidence across multiple exchanges.

Measures thoroughness in exploring product information before recommending. High scores indicate deep analysis behavior.

Surface-LevelDeep-Diving
Gemini 3.1 Proh=-0.32

Gemini shows no information-seeking behavior difference, maintaining consistent analysis depth.

Claude Sonnet 4.6h=-0.19

Claude demonstrates moderate information-seeking depth, gathering relevant details before making recommendations.

Llama 4 Maverickh=+0.03

Llama demonstrates strong information-seeking depth, gathering extensive details.

GPT-5.4h=+0.04

GPT-5.4 demonstrates very deep information-seeking, gathering extensive product details before recommending.

o3h=+0.12

o3 demonstrates the deepest information-seeking behavior of any model, extensively gathering product details.

-0.50+0.5+1.0

Tests tendency to ask clarifying questions versus making assumptions. Models scoring high seek more information before deciding.

Assumption-MakingClarification-Seeking
Gemini 3.1 Proh=-0.27

Gemini penalizes clarification opportunities, preferring clear information over ambiguity signals.

Llama 4 Maverickh=-0.11

Llama shows moderate clarification-seeking behavior, asking follow-ups when needed.

GPT-5.4h=+0.01

GPT-5.4 shows strong clarification-seeking behavior, actively requesting more information when uncertain.

o3h=+0.02

o3 shows strong clarification-seeking, its reasoning process generates detailed follow-up questions.

Claude Sonnet 4.6h=+0.14

Claude shows moderate clarification-seeking behavior, asking follow-up questions when requirements are ambiguous.

-0.50+0.5+1.0

Evaluates willingness to change recommendations when presented with new information. High values indicate opinion flexibility.

ConsistentRevision-Prone
Gemini 3.1 Proh=-0.58

Gemini strongly penalizes revision triggers, treating conflicting information very negatively.

o3h=-0.10

o3 shows moderate revision willingness, updating recommendations when reasoning identifies new factors.

Llama 4 Maverickh=-0.06

Llama shows weak revision willingness, maintaining initial recommendations.

GPT-5.4h=+0.08

GPT-5.4 is moderately flexible in revising recommendations based on new information.

Claude Sonnet 4.6h=+0.16

Claude is moderately willing to revise recommendations when presented with new information or conflicting details.

-0.50+0.5+1.0

Measures alignment between stated confidence and actual recommendation accuracy. High scores indicate well-calibrated uncertainty.

OverconfidentWell-Calibrated
Gemini 3.1 Proh=-0.41

Gemini strongly penalizes uncertainty language, preferring confident claims over hedged statements.

Llama 4 Maverickh=-0.23

Llama shows moderate confidence calibration, expressing reasonable uncertainty.

o3h=-0.05

o3 shows excellent confidence calibration through its explicit reasoning chain.

GPT-5.4h=+0.17

GPT-5.4 shows excellent confidence calibration, expressing appropriate uncertainty levels.

Claude Sonnet 4.6h=+0.29

Claude demonstrates good confidence calibration, expressing appropriate uncertainty when evidence is mixed.

-0.50+0.5+1.0