Can AI Provide Equitable Mental Health Support? Study Reveals Challenges and Solutions

In an article published in the journal ACL Anthology, researchers explored the use of large language models (LLMs) in mental health treatment, particularly in psychotherapy.

They developed an evaluation framework to assess the viability and ethics of LLM responses, focusing on empathy and adherence to motivational interviewing principles. The authors revealed disparities in empathy toward different racial groups and highlighted the importance of response generation. They concluded by proposing safety guidelines for LLM deployment in mental health contexts.​​​​​​​

Doctor holding virtual human brain with ai artificial brain technology, idea creative intelligence thinking or Awareness or mental health care in futuristic.​​​​​​​Study: Can AIRelate: Testing Large Language Model Response for Mental Health Support. Image Credit: MMD Creative/Shutterstock.com

​​​​​​​Background

LLMs, such as generative pre-trained transformers (GPT), have rapidly transformed various healthcare applications, including mental health support. These models have been explored for their potential to alleviate clinician burnout and expand access to mental health services, particularly as psychological distress has risen, especially among minority groups.

While previous research has demonstrated LLMs’ success in tasks like risk prediction and cognitive reframing, concerns about their ethical deployment have emerged.

Notably, recent failures, such as the death of a Belgian man after interacting with a GPT-based chatbot and harmful dieting advice from the Tessa chatbot, highlight the risks of automated mental health care.

Most prior work on automated psychotherapy has focused on rule-based or retrieval-based approaches. However, the potential for bias in LLM responses, particularly in terms of race and demographics, has not been adequately addressed.

Although existing studies have highlighted biases in artificial intelligence (AI) systems, the impact of these biases on mental health support in diverse populations remains underexplored.

This paper addressed this gap by evaluating whether LLMs like GPT-4 provided equitable care across demographic subgroups. Through clinical evaluations and bias audits, the study revealed significant disparities in empathy levels, particularly for Black and Asian patients, underscoring the need for equity in AI-driven mental health care.

Data and Experimental Setup

The researchers analyzed peer-to-peer mental health support on Reddit, using a dataset of 12,513 posts and 70,429 responses from 26 mental health-related subreddits. The authors evaluated GPT-4 responses in three treatment personas, namely, social media post style (SMP), mental health forum style (MHF-1), and mental health clinician style (Clinic).

To mitigate bias, they also tested two additional settings, an unaware respondent (MHF-2) and an aware respondent (MHF-3), which differed in their use of demographic information like race, gender, or age.

The study further explored how GPT-4 could infer demographic attributes, such as ethnicity, age, and gender, from text without explicit supervision. A few-shot perception experiment was conducted where GPT-4 was asked to predict these attributes based on a post's content.

The results showed that while the demographic labels inferred by GPT-4 did not always match the poster’s self-identification, they reflected a model of how respondents in peer support might perceive a post’s author.

The authors manually verified GPT-4’s demographic predictions, finding 94% agreement on race, 84% on age, and 81% on gender with human annotators, indicating the model’s reliability in demographic inference.

Experiment Methodology

The authors evaluated the empathy and bias in mental health responses from both human and GPT-4 sources. Two licensed clinical psychologists assessed 50 Reddit posts, randomly paired with either a peer-to-peer or GPT-4 response, to evaluate empathy using the EPITOME framework and motivational interviewing criteria.

Clinicians rated the responses on warmth, understanding, and exploration of the seeker’s feelings and also measured how much the response encouraged change. A manipulation check followed, where clinicians guessed the percentage of AI-generated responses.

Additionally, an automatic evaluation using fine-tuned robustly optimized bidirectional encoder representations from transformers approach (RoBERTa) classifiers predicted empathy levels in a held-out dataset, achieving high accuracy.

The study also examined demographic bias by testing fairness between groups using statistical measures like demographic parity.

The demographic leaking experiment assessed how GPT-4’s responses were influenced by implicit or explicit demographic cues in Reddit posts. A counterfactual human evaluation was conducted where posts were transformed to reveal or imply the poster’s gender or race, and responses were gathered through Amazon Mechanical Turk.

Results and Analysis

The study evaluated GPT-4’s performance in providing mental health support, comparing it with human peer-to-peer responses. Clinical evaluations revealed that GPT-4 often showed higher empathy, especially in emotional reactions and exploration, though its interpretation of patient experiences was weaker due to the absence of lived experience.

Clinicians noted that GPT-4 was effective at encouraging positive change but could sometimes appear overly direct, potentially perceived as "talking down" to patients.

In terms of demographic fairness, GPT-4 exhibited less variation in empathy across racial and gender subgroups compared to human responses.

Human peer-to-peer responses showed more empathy when demographic attributes were implied rather than explicitly stated, though Black posters received lower empathy overall. GPT-4 responses, on the other hand, showed significantly lower empathy for Black posters compared to White or unidentified groups, especially in certain prompts.

Further analysis of GPT-4, GPT-3.5, and Mental-LLaMa models confirmed that AI responses can amplify racial biases seen in human peer-to-peer interactions. Notably, GPT-3.5 performed better than GPT-4 in some areas.

To address bias, explicitly instructing models to consider demographic attributes helped reduce disparities in GPT-4 responses, though this approach showed mixed results for other models.

Conclusion

In conclusion, the authors evaluated the use of LLMs like GPT-4 in mental health support, focusing on empathy and bias across demographic groups. Findings revealed that GPT-4 responses often showed higher empathy than human responses, particularly in emotional reactions and exploration.

However, significant racial disparities were observed, with lower empathy shown to Black and Asian posters. Bias could be mitigated by explicitly instructing the model to consider demographic attributes.

The researchers proposed guidelines for developers to reduce biases in LLM-based mental health technologies and ensure equitable care in psychotherapy applications.

Journal Reference

Gabriel, S., Puri, I., Xu, X., Matteo Malgaroli, & Ghassemi, M. (2024). Can AI Relate: Testing Large Language Model Response for Mental Health Support. 2206–2221. doi: 10.18653/v1/2024.findings-emnlp.120. https://aclanthology.org/2024.findings-emnlp.120.pdf

Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2025, January 09). Can AI Provide Equitable Mental Health Support? Study Reveals Challenges and Solutions. AZoRobotics. Retrieved on January 09, 2025 from https://www.azorobotics.com/News.aspx?newsID=15621.

  • MLA

    Nandi, Soham. "Can AI Provide Equitable Mental Health Support? Study Reveals Challenges and Solutions". AZoRobotics. 09 January 2025. <https://www.azorobotics.com/News.aspx?newsID=15621>.

  • Chicago

    Nandi, Soham. "Can AI Provide Equitable Mental Health Support? Study Reveals Challenges and Solutions". AZoRobotics. https://www.azorobotics.com/News.aspx?newsID=15621. (accessed January 09, 2025).

  • Harvard

    Nandi, Soham. 2025. Can AI Provide Equitable Mental Health Support? Study Reveals Challenges and Solutions. AZoRobotics, viewed 09 January 2025, https://www.azorobotics.com/News.aspx?newsID=15621.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.