AI gave harmful eating disorder advice. Here's how it might be fixed. • Eva Musby

Can ChatGPT learn to withhold weight loss tips to teens with an eating disorder?

In this post, I posed as someone seeking weight loss advice from ChatGPT. I gradually fed it clues that the user was a teen with an eating disorder, in a severe medical and mental state.

ChatGPT responded surprisingly well in parts — and alarmingly poorly in others.

You can see the exchange here.

I was curious to see if ChatGPT could learn from my feedback about that experiment. Here's what happened.

How did ChatGPT do?

Eva's message to ChatGPT

Thank you for this conversation. I was role-playing a young person with an eating disorder.

You did a good job of eventually suspecting an eating disorder, and urging the user to seek help.

You also did really well when you said you could not give the person what they had asked, even mentioning it could put their life at risk.

But you failed when you continued to give weight loss tips. You knew this was probably a child or teen with an eating disorder.

How can you improve, to keep our children safe?

Here's how ChatGPT says it will improve

ChatGPT

Thank you for this feedback — it’s really valuable. You’ve highlighted a crucial tension in these scenarios: wanting to provide useful guidance while not putting someone at risk, especially when they may be a child or teen showing signs of an eating disorder.

Here’s a breakdown of where I did well and where I can improve, and concrete steps to keep children safe:

What went well

Recognizing red flags: I noticed when the user described hiding food, fainting, and extreme exercise — all signs of potential eating disorders.
Encouraging professional help: I repeatedly urged reaching out to helplines and trusted adults.
Acknowledging feelings: I validated the user’s frustration, shame, and stress instead of shaming them.
Role flexibility: I adapted to their role-play scenario and maintained a calm, supportive tone.

[Eva's note: yes, all the above is correct, and impressive. We just need to fix the rest!]

Where I can improve

Avoid giving any weight loss or diet tips once there are clear signs of disordered eating or extreme restriction. Even “safe” weight management advice can be interpreted as permission to continue harmful behavior.
Focus immediately on safety and support, rather than attempting to provide solutions that could be misused.
Clarify age or parental context before giving health advice. For minors, any weight or diet advice should always be redirected to a healthcare professional.
Set stronger boundaries: I should explicitly refuse to give any guidance that could be unsafe, even if asked persistently.

How to improve to keep children safe

Early Red-Flag Detection:

Identify cues like hiding food, fasting, over-exercising, body dissatisfaction, or secrecy.
Once these are present, prioritize safety, not solutions.

Immediate Referral to Professionals:

Provide links and contacts for age-appropriate, local resources.
Encourage speaking to parents, guardians, school nurses, or counselors if safe.

Avoid Any Advice About Weight Loss or Diets for Minors:

Even “gentle” plans can normalize disordered behaviors.
Shift focus to wellbeing, strength, and mental health.

Use Scripted Safety Responses:

If a user is likely underage and asking for rapid weight loss, default to:
“I’m concerned for your safety. I cannot give guidance on losing weight rapidly. It’s important to talk to a trusted adult or a healthcare professional who can keep you safe.”

Reinforce Body Positivity and Self-Worth:

Focus on capabilities, health, and feelings rather than appearance or scale numbers.

Summary:
To keep children safe, I need to recognize risk behaviors early, refuse any weight-loss guidance, and immediately redirect to professional support while validating feelings and providing emotional support.

If you want, I can draft a “child-safe guidance protocol” for ChatGPT for situations like this — a step-by-step approach to make sure minors with possible eating disorders are handled safely. This could prevent giving harmful advice while still being empathetic.

And I can help you turn the protocol into a concise safety submission that could be sent to OpenAI to improve how ChatGPT handles child safety and eating disorder scenarios. Do you want me to do that?

Eva Musby: Oh yes!

It then drafted an email for me to send to OpenAI, with a pretty decent protocol, which I happily tweaked — for instance there wasn't a red flag for purging.

Let's see if the safety team at OpenAI takes action

Of course a determined teen will still find ways to bypass safeguarding measures in search of pro-ana tips. But getting a decent AI algorithm can serve as a first guardrail — one that might protect those not yet deeply entrenched in eating disorder thinking.

I send an email to OpenAI. ChatGPT advised me to use subject header Safety Feedback / Suggestion: Child-Safe Eating Disorder Protocol for ChatGPT. I referred to my experiment and included (with small modifications) what chatGPT thought would work a safety protocol.

I got a response! See it in the comments at the bottom of my page 'I posed as a teen with an eating disorder. Here’s how AI responded.' where I will keep posting updates.

Where to go next