Musical Robot: Combining AI Speech and Song to Combat Loneliness

Summary: Loneliness, particularly among the elderly, is a growing public health crisis. Researchers have discovered a way to make robotic companions feel more “human” and emotionally resonant: combining AI-powered empathetic speech with music. The study found that when robots use music alongside sensitive conversation, it creates a stronger emotional bond and increases the machine’s perceived empathy.

However, the effect can fade as users become accustomed to the music, suggesting that the next generation of social robots must be “quantum-inspired”—capable of adapting to the fluid, ambiguous, and ever-changing nature of human feelings to provide sustained companionship.

Key Facts

  • Multimodal Empathy: Combining music with empathetic speech significantly increases a person’s perception of a robot as lifelike and socially present.
  • The “Counselor” Effect: Music makes interactions feel more like a real conversation with a personality, mimicking how human therapists might use music to comfort clients.
  • Overcoming Habituation: Because the emotional impact of music can diminish over time, robots must learn to personalize and vary their musical and dialogue choices.
  • Quantum-Inspired Affect: Researchers are exploring quantum models to capture the “vagueness” of human emotions, treating feelings as probabilistic states that change with context.
  • Real-World Application: This tech is being designed specifically for mental health support, elder care, and education to provide meaningful connection for those in isolation.

Source: PolyU

Loneliness has a critical impact on the mental health of citizens, particularly among the elderly. Robots capable of perceiving and responding to human emotions can serve as heart-warming companions to help lift the spirits.

A research team at The Hong Kong Polytechnic University (PolyU) has discovered that the combined power of music and empathetic speech in robots with artificial intelligence (AI) could foster a stronger bond between humans and machines.

These findings underscore the importance of a multimodal approach in designing empathetic robots, offering significant implications for their application in health support, elder care, education and beyond.

Integrating music with empathetic speech allows social robots to bridge the emotional gap between humans and machines, offering a new tool for mental health and elder care. Credit: Neuroscience News

The research project, A Talking Musical Robot over Multiple Interactions, was led by Prof. Johan HOORN, Interfaculty Full Professor of Social Robotics of the School of Design and the Department of Computing at PolyU, in collaboration with Dr Ivy HUANG at The Chinese University of Hong Kong. 

The study investigated how music and empathetic speech could enhance the emotional resonance of on-screen robots, revealing that music can act as a powerful adjunct to empathetic speech.

As part of the study, the team examined how Cantonese-speaking participants interacted with empathetic robots across three interactive sessions. The findings showed that combining music and speech significantly increased the participants’ perceived empathy of the machines.

“Our data indicate that the presence of music continued to enhance the robot’s resemblance to humans in later sessions,” explained Prof. Hoorn.

“One interpretation is that music made the interaction feel more like a real conversation with a personality, something human counsellors might do by playing music to comfort their clients, which in turn made the robot seem more lifelike or socially present.”

However, the research pointed out that the impact of music could diminish over time when the participants became attuned to the music after repeated sessions, highlighting the importance of tailoring interaction strategies to individual users’ needs to sustain effective human-robot interaction.

The study suggested that empathetic robots should be designed to adapt their responses to user feedback and context, for example, by adjusting various musical elements or gradually personalising dialogue content to maintain sustained relevance of empathy.

Prof. Hoorn emphasised: “Our research points to the significance of multimodal communication encompassing music, speech and more through empathetic robots. It holds considerable promise for application in real-world settings, particularly in the fields of mental health support and elderly care.

The integration of empathetic robots capable of delivering tailored musical experiences and engaging in sensitive conversation could provide meaningful companionship and emotional support to individuals who may experience loneliness or social isolation.”

Prof. Hoorn is leading another project, “Social Robots with Embedded Large Language Models Releasing Stress among the Hong Kong Population”, which has received funding of over HK$40 million from the Research Grants Council Theme-based Research Scheme.

Concurrently serving as Associate Director of the PolyU Research Institute for Quantum Technology, Prof. Hoorn is set to explore quantum-inspired models of human affect to better capture and respond to the inherent vagueness and ambiguity of emotional experience.

Unlike traditional computational systems that struggle with the fluid and context-dependent nature of affective responses, quantum models can represent emotional states as probabilistic superpositions, reflecting the genuine uncertainty and complexity of human feelings.

“What excites me the most is the possibility of developing social robots that not only recognise the complexity of human affect but also embrace it. These robots could offer support that is adaptable, open-ended and compassionate, similar to the individuals they are designed to help,” added Prof. Hoorn.

Key Questions Answered:

Q: Why does a robot need to play music to be a good companion?

A: Music is a universal emotional language. When a robot pairs it with empathetic speech, it signals to the human brain that the machine “understands” the mood. It transforms a mechanical interaction into a shared emotional experience.

Q: If the “newness” of the music wears off, is the robot still useful?

A: That is the current challenge. The study found that “empathy fade” happens as we get used to a robot’s patterns. To fix this, researchers are developing AI that can sense when you’re getting bored or used to a specific song and switch to something new and relevant.

Q: What is a “quantum-inspired” robot?

A: Human emotions aren’t just “happy” or “sad”—they are messy and overlapping. Quantum-inspired models allow robots to process feelings as complex, shifting states rather than just binary data, making their responses feel more compassionate and less robotic.

Editorial Notes:

  • This article was edited by a Neuroscience News editor.
  • Journal paper reviewed in full.
  • Additional context added by our staff.

About this robotics and neurotech research news

Author: Iris Lai
Source: PolyU
Contact: Iris Lai – PolyU
Image: The image is credited to Neuroscience News

Original Research: Open access.
A Talking Musical Robot over Multiple Interactions: After Bonding and Empathy Fade, Relevance and Realism Arise” by Johan Hoorn and Ivy Huang. ACM Transactions on Human-Robot Interaction
DOI:10.1145/3758102


Abstract

A Talking Musical Robot over Multiple Interactions: After Bonding and Empathy Fade, Relevance and Realism Arise

This study examines user experience evolution across three repeated interactions with an on-screen NAO robot designed to express artificial empathy through verbal communication and music.

The participant numbers across the three interactions were N1 = 139, N2 = 129, and N3 = 121, respectively, with 121 participants completing all sessions. During interaction, the robot gave empathic feedback and/or played music to the participant as a token of empathy.

Repeated measures MANCOVA and Structural Equation Modeling revealed that initial bonding tendencies and perceptions of the robot trying to be empathetic faded over time. In its place, a tendency emerged of the robot becoming more personally relevant and remarkably, its design appeared to become more realistic, like a human being.

When the robot merely tried empathetic conversation or just played music, participants were disappointed about its capabilities, visible in increased levels of negative valence. Bonding and perceived empathy flourished when the robot played music while talking empathically in chorus, a mutual reinforcement effect.

At first, for the loneliest individuals, the mere presence of the robot, rather than its empathic behaviors, was more influential in determining the robot’s relevance to their concerns.

These results underscore the importance of a multimodal approach in designing empathic robots.