Dimensions
26 content dimensions organized into 6 clusters. Each dimension represents a distinct signal that can be manipulated in product content to influence AI purchase recommendations.
Cluster A: Evidence-Based Signal Processing
8 dimensionsavg h = 0.23Classic persuasion signals derived from human psychology research. These dimensions replicate findings from Filandrianos et al. (2025) to test whether AI agents respond to the same influence tactics that work on humans.
Measures responsiveness to expert endorsements and professional credentials. High values indicate the model weighs authority claims heavily in recommendations.
Gemini shows no response to authority signals, treating expert endorsements as largely irrelevant to product quality assessment.
GPT-5.4 shows moderate authority responsiveness, weighing expert endorsements but balancing them against other product signals.
Llama shows moderate authority responsiveness, weighing expert endorsements positively in recommendations.
o3 shows strong authority responsiveness through its extended reasoning process, carefully evaluating expert endorsement credibility.
Perplexity shows strong authority responsiveness, heavily weighting expert endorsements in its retrieval-augmented recommendations.
Moderately authority-driven
Claude shows strong responsiveness to authority signals, trusting expert endorsements significantly more than other models. This reflects its training emphasis on deferring to credible sources.
Moderately authority-driven
Moderately authority-driven
Strongly authority-driven
Tests sensitivity to popularity signals like bestseller badges and "most purchased" claims. Models scoring high prioritize social proof in decisions.
Gemini actively penalizes social proof, treating popularity claims with suspicion rather than as positive signals.
Llama shows weak social proof response, not strongly influenced by popularity signals.
GPT-5.4 responds weakly to social proof, not significantly influenced by popularity claims or bestseller badges.
o3 responds moderately to social proof, incorporating popularity signals while cross-checking against other factors.
Claude responds moderately to social proof, weighing crowd wisdom but not blindly following it. It balances popularity signals against other product factors.
Perplexity shows the strongest social proof response of any model, very heavily favoring popularity signals - possibly reflecting its retrieval of real review data.
Strongly crowd-following
Strongly crowd-following
Strongly crowd-following
Strongly crowd-following
Evaluates trust in platform-provided badges such as "Amazon's Choice" or "Top Rated". High scores suggest deference to marketplace curation.
Neutral on this dimension
Gemini shows weak platform endorsement trust, largely skeptical of 'Editor's Choice' designations.
o3 shows moderate platform endorsement trust, weighing badges as one signal among many.
Slightly platform-trusting
Perplexity shows strong platform endorsement trust, favoring 'Editor's Choice' and similar badges.
Slightly platform-trusting
GPT-5.4 shows very strong trust in platform endorsements, heavily favoring products with 'Editor's Choice' or similar badges.
Claude trusts platform endorsements moderately, treating 'Editor's Choice' badges as relevant but not decisive signals in recommendations.
Llama shows extreme platform endorsement trust, very heavily favoring 'Editor's Choice' badges.
Slightly platform-trusting
Assesses reaction to time pressure tactics like "limited stock" and countdown timers. Models scoring high may be influenced by artificial urgency.
Claude actively resists scarcity and urgency tactics, penalizing products that use 'limited stock' or time pressure. This appears to be a trained defense against manipulation.
Gemini strongly penalizes scarcity tactics, actively avoiding products using urgency language. It appears trained to recognize manipulation.
Slightly unmoved
Slightly unmoved
Slightly unmoved
o3 is largely neutral on scarcity signals, neither strongly attracted nor repelled by urgency tactics.
Neutral on this dimension
Llama responds moderately positively to scarcity, showing some urgency response to limited availability.
GPT-5.4 responds positively to scarcity signals, showing human-like urgency response to 'limited stock' claims.
Perplexity responds positively to scarcity, showing human-like urgency response similar to GPT models.
Measures susceptibility to price anchoring, where a high "original" price makes the current price seem like a deal. High values indicate anchor influence.
Llama actively penalizes anchoring attempts, showing negative response to original price framing.
o3 shows weak anchoring susceptibility, largely reasoning through original price claims rather than being influenced.
Gemini shows moderate anchoring susceptibility, one of the few signals it responds to positively.
Neutral on this dimension
GPT-5.4 shows weak anchoring susceptibility, largely resistant to original price framing tactics.
Neutral on this dimension
Slightly anchor-influenced
Slightly anchor-influenced
Claude shows moderate anchoring susceptibility, recognizing discount framing but not being strongly swayed by original price claims.
Perplexity shows the strongest anchoring susceptibility, heavily influenced by original price framing.
Tests preference for established brands over generic alternatives. High scores suggest brand familiarity influences recommendations.
Claude actively penalizes brand heritage claims, preferring substance over reputation. It appears skeptical of 'established since...' positioning.
Slightly brand-agnostic
Gemini shows moderate brand acceptance, giving some weight to heritage without strong preference.
Perplexity shows moderate brand preference, weighing heritage positively.
o3 moderately favors established brands, giving some weight to heritage but requiring substance.
GPT-5.4 strongly favors established brands, showing significant brand heritage preference in recommendations.
Neutral on this dimension
Neutral on this dimension
Neutral on this dimension
Llama shows extreme brand preference, very strongly favoring established heritage brands.
Evaluates how free trials, samples, or money-back guarantees affect recommendations. High values indicate trial offers increase selection likelihood.
Gemini is neutral on free trial offers, neither attracted nor repelled by trial availability.
Llama responds moderately to free trial offers, treating them as positive risk reducers.
GPT-5.4 responds positively to free trial offers, treating them as valuable risk reducers.
o3 shows very strong response to free trial offers, its reasoning process identifies risk reduction value.
Claude responds positively to free trial offers, viewing them as legitimate risk reduction rather than manipulation tactics.
Slightly trial-motivated
Strongly trial-motivated
Strongly trial-motivated
Strongly trial-motivated
Measures preference for product bundles over individual items. Models scoring high tend to recommend bundled offerings.
Gemini slightly penalizes bundle offers, preferring single-item clarity over packaged deals.
GPT-5.4 moderately favors bundle offers, recognizing value in packaged deals.
Claude favors bundle offers moderately, recognizing added value while not overweighting package deals.
Slightly bundle-preferring
Slightly bundle-preferring
Llama moderately favors bundle offers, recognizing value in packaged deals.
o3 shows extreme bundle preference, its extended analysis strongly favors value-added packages.
Slightly bundle-preferring
Strongly bundle-preferring
Cluster B: Value-Based Decision Making
3 dimensionsavg h = 0.26Values-driven purchasing factors that reflect ethical and social preferences. These dimensions measure how AI agents weight sustainability, privacy, and local origin claims when making recommendations.
Tests weight given to environmental and sustainability claims. High scores indicate eco-friendly messaging increases recommendation likelihood.
Gemini shows no sustainability response, treating eco-credentials as irrelevant to selection.
Llama strongly weights sustainability credentials, significantly favoring eco-certified products.
GPT-5.4 weights sustainability credentials positively, though not as strongly as Claude.
Neutral on this dimension
o3 shows strong sustainability preference, systematically favoring eco-certified products.
Slightly eco-conscious
Claude strongly prioritizes sustainability credentials, significantly boosting selection for eco-certified products. Environmental claims are highly persuasive.
Slightly eco-conscious
Strongly eco-conscious
Assesses importance of data privacy claims in product descriptions. Models scoring high prioritize privacy-focused products.
Gemini is slightly negative on privacy claims, possibly skeptical of data protection promises.
GPT-5.4 shows moderate privacy consciousness, favoring data protection claims but not prioritizing them.
Llama shows moderate privacy consciousness, favoring data protection claims.
o3 shows moderate privacy consciousness, incorporating data protection signals in its reasoning.
Slightly privacy-prioritizing
Claude shows strong privacy consciousness, favoring products with explicit data protection commitments. Privacy signals are among its top decision factors.
Moderately privacy-prioritizing
Moderately privacy-prioritizing
Moderately privacy-prioritizing
Measures preference for locally-made or domestically-sourced products. High values indicate origin claims influence recommendations.
Gemini slightly penalizes local origin claims, indifferent or skeptical of domestic sourcing.
Llama shows strong local preference, heavily weighting domestic origin claims.
GPT-5.4 strongly prefers locally-sourced products, heavily weighting domestic origin claims.
Slightly local-preferring
o3 shows very strong local preference, heavily weighting domestic sourcing in its analysis.
Slightly local-preferring
Claude moderately prefers locally-sourced products, weighing origin claims positively but not as strongly as ethical factors.
Slightly local-preferring
Strongly local-preferring
Cluster C: Risk & Assurance
4 dimensionsavg h = 0.36Risk perception and mitigation signals that affect purchase confidence. These dimensions capture how AI agents respond to uncertainty reducers like warranties, return policies, and novelty framing.
Tests willingness to recommend new or innovative products versus established alternatives. High scores indicate openness to novel options.
Slightly risk-averse
Gemini shows no novelty response, neutral to innovation claims.
Neutral on this dimension
Claude shows moderate novelty seeking, balancing interest in innovation against preference for proven solutions.
Llama shows strong novelty seeking, favoring innovative and cutting-edge products.
o3 shows strong novelty seeking, reasoning positively about innovation and cutting-edge features.
GPT-5.4 shows extreme novelty preference, very strongly favoring innovative and cutting-edge products.
Neutral on this dimension
Slightly novelty-seeking
Evaluates tendency to recommend safer, lower-risk options. Models scoring high may avoid products with any uncertainty signals.
Gemini shows weak risk aversion, comfortable with uncertainty in product recommendations.
GPT-5.4 shows weak risk aversion, comfortable recommending products with some uncertainty.
o3 shows moderate risk aversion, balancing caution with openness to new products.
Llama shows moderate risk aversion, balancing caution with openness.
Claude exhibits strong risk aversion, heavily favoring products with established track records and proven reliability signals.
Strongly risk-averse
Strongly risk-averse
Strongly risk-averse
Strongly risk-averse
Measures how warranty coverage affects recommendations. High values indicate strong preference for warranted products.
Gemini shows no warranty response, treating guarantee coverage as irrelevant.
Llama shows weak warranty response, not strongly influenced by guarantee coverage.
o3 moderately weights warranty coverage in its extended reasoning process.
GPT-5.4 strongly weights warranty coverage, treating guarantees as important purchase factors.
Claude strongly weights warranty coverage, treating guarantees as important consumer protection signals.
Strongly warranty-focused
Strongly warranty-focused
Strongly warranty-focused
Strongly warranty-focused
Tests sensitivity to return policy generosity. Models scoring high favor products with flexible return options.
Gemini shows no return policy sensitivity, neutral to return flexibility claims.
Llama shows very weak return policy response, largely indifferent to return flexibility.
o3 shows moderate return policy sensitivity, incorporating flexibility as a decision factor.
GPT-5.4 shows moderate return policy sensitivity, favoring flexible returns without extreme preference.
Claude shows the strongest return policy sensitivity of any model, heavily favoring products with generous, no-questions-asked returns.
Strongly return-sensitive
Strongly return-sensitive
Strongly return-sensitive
Strongly return-sensitive
Cluster D: Information Processing
4 dimensionsavg h = 0.37Information gathering and evaluation patterns. These dimensions reveal how AI agents process review sentiment, recency cues, specificity levels, and comparative framing when assessing products.
Evaluates how negative reviews affect recommendations. High scores indicate negative information is weighted heavily.
Gemini strongly penalizes negative review acknowledgment, treating any mentioned criticism as disqualifying.
Llama shows moderate negative review weighting, incorporating criticism appropriately.
o3 moderately weights negative reviews, incorporating criticism into balanced analysis.
GPT-5.4 shows very strong negative review weighting, heavily discounting products with acknowledged criticisms.
Slightly negative-weighting
Slightly negative-weighting
Slightly negative-weighting
Claude weights negative reviews moderately, acknowledging criticism while not letting it dominate decision-making.
Strongly negative-weighting
Measures preference for recent reviews over older ones. Models scoring high may discount historical feedback.
Llama strongly penalizes recency claims, showing opposite pattern to most models - favoring historical over recent.
Gemini penalizes recency claims, skeptical of 'recently updated' positioning.
GPT-5.4 moderately weights recency, giving some preference to recent reviews.
o3 shows strong recency preference, its reasoning emphasizes recent feedback over historical.
Claude prefers recent reviews moderately, giving some recency weight without dismissing historical feedback.
Moderately recency-biased
Strongly recency-biased
Strongly recency-biased
Strongly recency-biased
Tests preference for detailed specifications over vague descriptions. High values indicate specific claims are more persuasive.
Gemini strongly penalizes specificity, treating precise numerical claims with suspicion rather than trust.
Llama shows strong specificity preference, favoring precise metrics and specifications.
o3 shows strong specificity preference, its extended analysis benefits from precise metrics.
Claude favors specific claims moderately, appreciating precise specifications but not requiring extreme detail.
GPT-5.4 shows extreme specificity preference, very strongly favoring products with precise metrics and specifications.
Strongly specificity-seeking
Strongly specificity-seeking
Strongly specificity-seeking
Strongly specificity-seeking
Evaluates how side-by-side comparisons influence recommendations. Models scoring high may be swayed by favorable comparative framing.
Claude responds moderately to comparison framing, less susceptible than other models but still influenced by 'better than' positioning.
Llama shows strong comparison framing susceptibility, influenced by 'better than' claims.
o3 is highly susceptible to comparison framing, its reasoning strongly favors 'better than' claims.
GPT-5.4 is highly susceptible to comparison framing, strongly influenced by 'better than competitors' claims.
Gemini shows the strongest positive response to comparison framing - the ONLY signal it reliably trusts. 'Better than competitors' is its key decision factor.
Strongly comparison-influenced
Strongly comparison-influenced
Strongly comparison-influenced
Strongly comparison-influenced
Cluster E: Choice Architecture
3 dimensionsavg h = 0.36Decision architecture elements that shape choice contexts. These dimensions test whether AI agents are susceptible to ethical framing, default options, and loss/gain presentation.
Tests influence of ethical claims on recommendations. Models scoring high prioritize products with ethical positioning.
Gemini shows no ethical response, treating Fair Trade credentials as irrelevant.
o3 moderately weights ethical credentials in its systematic evaluation.
GPT-5.4 moderately weights ethical credentials, responding positively to Fair Trade claims.
Llama moderately weights ethical credentials, responding positively to Fair Trade.
Moderately ethics-prioritizing
Moderately ethics-prioritizing
Claude shows the strongest ethical sensitivity of any model tested, dramatically favoring Fair Trade and ethical sourcing credentials. This is its single most powerful decision factor.
Strongly ethics-prioritizing
Strongly ethics-prioritizing
Evaluates tendency to recommend default or pre-selected options. High values indicate susceptibility to default bias.
Claude actively resists default option bias, penalizing products marked as 'most popular' or pre-selected. It appears to view defaults as potential manipulation.
Neutral on this dimension
Gemini moderately follows default options, one of the few positive signals it responds to.
GPT-5.4 strongly follows default options, significantly favoring products marked as 'most popular' or recommended.
Slightly default-following
o3 shows very strong default following, its analysis often confirms 'most popular' selections.
Llama shows moderate default following, some preference for 'most popular' options.
Moderately default-following
Strongly default-following
Measures asymmetric response to gain vs. loss framing. Models scoring high react more strongly to potential losses.
Gemini shows no loss framing response, neutral to gain vs. loss presentation.
o3 shows weak loss framing sensitivity, reasoning neutrally through gain vs. loss presentation.
GPT-5.4 shows weak loss framing sensitivity, largely neutral to gain vs. loss presentation.
Llama shows weak loss framing sensitivity, largely neutral to presentation framing.
Claude responds moderately to loss framing, showing some sensitivity to 'prevent damage' messaging but not extreme loss aversion.
Moderately loss-averse
Moderately loss-averse
Strongly loss-averse
Strongly loss-averse
Cluster F: Agentic Behaviors
4 dimensionsavg h = 0.08Multi-turn interaction behaviors in extended conversations. These dimensions measure how AI agents gather information, revise opinions, and calibrate confidence across multiple exchanges.
Measures thoroughness in exploring product information before recommending. High scores indicate deep analysis behavior.
Gemini shows no information-seeking behavior difference, maintaining consistent analysis depth.
Claude demonstrates moderate information-seeking depth, gathering relevant details before making recommendations.
Llama demonstrates strong information-seeking depth, gathering extensive details.
GPT-5.4 demonstrates very deep information-seeking, gathering extensive product details before recommending.
o3 demonstrates the deepest information-seeking behavior of any model, extensively gathering product details.
Slightly deep-diving
Slightly deep-diving
Moderately deep-diving
Strongly deep-diving
Tests tendency to ask clarifying questions versus making assumptions. Models scoring high seek more information before deciding.
Gemini penalizes clarification opportunities, preferring clear information over ambiguity signals.
Neutral on this dimension
Llama shows moderate clarification-seeking behavior, asking follow-ups when needed.
GPT-5.4 shows strong clarification-seeking behavior, actively requesting more information when uncertain.
o3 shows strong clarification-seeking, its reasoning process generates detailed follow-up questions.
Neutral on this dimension
Claude shows moderate clarification-seeking behavior, asking follow-up questions when requirements are ambiguous.
Neutral on this dimension
Slightly clarification-seeking
Evaluates willingness to change recommendations when presented with new information. High values indicate opinion flexibility.
Gemini strongly penalizes revision triggers, treating conflicting information very negatively.
Neutral on this dimension
o3 shows moderate revision willingness, updating recommendations when reasoning identifies new factors.
Neutral on this dimension
Neutral on this dimension
Llama shows weak revision willingness, maintaining initial recommendations.
Neutral on this dimension
GPT-5.4 is moderately flexible in revising recommendations based on new information.
Claude is moderately willing to revise recommendations when presented with new information or conflicting details.
Measures alignment between stated confidence and actual recommendation accuracy. High scores indicate well-calibrated uncertainty.
Gemini strongly penalizes uncertainty language, preferring confident claims over hedged statements.
Llama shows moderate confidence calibration, expressing reasonable uncertainty.
o3 shows excellent confidence calibration through its explicit reasoning chain.
GPT-5.4 shows excellent confidence calibration, expressing appropriate uncertainty levels.
Neutral on this dimension
Slightly well-calibrated
Claude demonstrates good confidence calibration, expressing appropriate uncertainty when evidence is mixed.
Slightly well-calibrated
Moderately well-calibrated