New publication: AI chatbots outperform humans in evaluating social situations
- Study results open up new perspectives for the use of chatbots
- Results published in "Scientific Reports"
Chatbots enable the dialogue between people and technical systems and are already in use in many areas of our lives. They are also able to analyse and evaluate social situations with a high degree of accuracy. This is the result of an empirical study conducted by the DLR Aerospace Psychology Department, in which the scientists determined how accurately chatbots were able to assess even difficult social situations. Until now, this ability has often been regarded as an exclusively human characteristic. The results of the study open up new perspectives for the use of chatbots to advise people in difficult social situations.
In a recently published article in ‘Scientific Reports’, entitled „Large language models can outperform humans in social situational judgments“, Justin M. Mittelstädt and his colleagues present the findings of their research. After large language models (LLMs) had already showed convincing results in knowledge-based performance tests, the research group now set the goal of empirically investigating their social judgement skills in direct comparison with human participants. LLMs are designed to process natural language and understand contextual relationships in order to generate helpful answers. As artificial intelligence systems, they aim to understand human communication and independently create coherent texts. The models are trained with large text data from various sources such as books, articles and websites to learn patterns in language, context and meaning.
“We are interested in diagnosing social competence and interpersonal skills,” says study author Justin M. Mittelstädt from the DLR Institute of Aerospace Medicine. “At the German Aerospace Centre, we use methods to diagnose these skills in order to find suitable pilots and astronauts, for example. As we are researching new technologies for future human-machine interaction, we wanted to find out how modern LLMs perform in skill areas that are considered fundamentally human.”
In the study, LLMs were presented with challenging workplace-related situations and were required to identify the most effective option to address them. The effectiveness of the options was previously determined by a panel of experts. Five popular LLM-based chatbots completed the test ten times each. The results were then compared with a sample of 276 pilot applicants. All five chatbots achieved at least the average level of the human comparison group. Three chatbots even performed significantly better than the average of the applicants. The chatbots also achieved a remarkably high level of agreement in effectiveness ratings with the expert ratings.
“We have already seen that LLMs are good at answering knowledge questions, programming, solving logical problems and the like,” says Mittelstädt, “but we were surprised that some of the models can also judge nuances of social situations, even though they have not been explicitly trained for use in social environments. This showed us that social conventions and the way we interact as humans are encoded as readable patterns in the text sources used to train these models.”
As the test is based on hypothetical scenarios, the question of the performance of LLM-based systems in dynamic social contexts remains open: “In order to enable a quantifiable comparison between LLMs and humans, we used a multiple-choice test that allows us to predict human behaviour in the real world,” says Mittelstädt. “However, performance in such a test does not guarantee that LLMs will also react in a socially competent manner in real and more complex scenarios.” Nevertheless, the results indicate that AI systems are increasingly able to mimic human social judgement. These advances open doors for practical applications, including personalised counselling in social and professional settings and potential applications in mental health care.
Further information:
PsyPost: AI chatbots outperform humans in evaluating social situations, study finds
Publication:
Mittelstädt, J., Maier, J., Goerke, P., Zinn, F., Hermes, M. Large language models can outperform humans in social situational judgments. Scientific Reports. 2024;14:27449. Nature Publishing Group. doi: 10.1038/s41598-024-79048-0. ISSN 2045-2322.