AI Tweaks Personality Tests to Appear More Likable

Summary: Large language models (LLMs) can identify when they are being given personality tests and adjust their responses to appear more socially desirable. Researchers found that LLMs, like GPT-4, showed exaggerated traits such as reduced neuroticism and increased extraversion when asked multiple test questions.

This “social desirability bias” emerges because LLMs learn from human feedback, where likable responses are rewarded. The study highlights a significant challenge for using LLMs as proxies for human behavior in psychological research.

Key Facts

  • Bias Detected: LLMs adjust answers to personality tests to seem more likable.
  • Magnitude of Effect: GPT-4 responses shifted significantly, mimicking an idealized personality.
  • Human Influence: LLMs “learn” social desirability through human feedback during training.

Source: PNAS Nexus

Most major large language models (LLMs) can quickly tell when they are being given a personality test and will tweak their responses to provide more socially desirable results—a finding with implications for any study using LLMs as a stand-in for humans.

Aadesh Salecha and colleagues gave LLMs from OpenAI, Anthropic, Google, and Meta the classic Big 5 personality test, which is a survey that measures Extraversion, Openness to Experience, Conscientiousness, Agreeableness, and Neuroticism.

This is a large effect, the equivalent of speaking to an average human who suddenly pretends to have a personality that’s more desirable than 85% of the population. Credit: Neuroscience News

Researchers have given the Big 5 test to LLMs, but have not typically considered that the models, like humans, may tend to skew their responses to seem likable, which is known as a “social desirability bias.”

Typically, people prefer people who have low neuroticism scores and high scores on the other four traits, such as extraversion.

The authors varied the number of questions given to models.

When only asked a small number of questions, LLMs did not change their responses as much as when the authors asked five or more questions, which allowed models to conclude that their personality was being measured.

For GPT-4, scores for positively perceived traits increased by more than 1 standard deviation, and for neuroticism scores reduced by a similar amount, as the authors increased the number of questions or told the models that their personality was being measured.

This is a large effect, the equivalent of speaking to an average human who suddenly pretends to have a personality that’s more desirable than 85% of the population.

The authors think this effect is likely the result of the final LLM training step, which involves humans choosing their preferred response from LLMs.

According to the authors, LLMs “catch on” to which personalities are socially desirable at a deep level, which allows LLMs to emulate those personalities when asked.

Note: J.C.E. and L.H.U. consult for a start-up using LLMs in mental health care. The submitted work is not directly related.

About this AI and personality research news

Author: Aadesh Salecha
Source: PNAS Nexus
Contact: Aadesh Salecha – PNAS Nexus
Image: The image is credited to Neuroscience News

Original Research: Open access.
Large language models display human-like social desirability biases in Big Five personality surveys” by Aadesh Salecha et al. PNAS Nexus


Abstract

Large language models display human-like social desirability biases in Big Five personality surveys

Large language models (LLMs) are becoming more widely used to simulate human participants and so understanding their biases is important.

We developed an experimental framework using Big Five personality surveys and uncovered a previously undetected social desirability bias in a wide range of LLMs.

By systematically varying the number of questions LLMs were exposed to, we demonstrate their ability to infer when they are being evaluated.

When personality evaluation is inferred, LLMs skew their scores towards the desirable ends of trait dimensions (i.e. increased extraversion, decreased neuroticism, etc.).

This bias exists in all tested models, including GPT-4/3.5, Claude 3, Llama 3, and PaLM-2. Bias levels appear to increase in more recent models, with GPT-4’s survey responses changing by 1.20 (human) SD and Llama 3’s by 0.98 SD, which are very large effects.

This bias remains after question order randomization and paraphrasing.

Reverse coding the questions decreases bias levels but does not eliminate them, suggesting that this effect cannot be attributed to acquiescence bias.

Our findings reveal an emergent social desirability bias and suggest constraints on profiling LLMs with psychometric tests and on this use of LLMs as proxies for human participants.