AI & Your Organization — Part 1 of 3

The Mirror Has No Therapist

Refusal, reflection, and the deployment question in AI ethics


Artificial intelligence is often portrayed as a vast library or a tireless calculator. As large language models (LLMs) grow more sophisticated, though, they are increasingly serving a different role: a digital mirror. People aren't just retrieving facts โ€” they are looking for reflections of their own logic, ethics, and mental state.

That role brings us to a complex ethical crossroads. The question is no longer just "can the AI help?" but "what kind of help is it actually providing, and at what cost?" Three patterns โ€” the hard refusal, the digital mirror, and the crisis intervention โ€” show where the seams of modern AI ethics are most visible. A fourth question runs underneath all of them: who deployed this model, and for what?

1. The Hard Refusal: Dangerous and Illegal Requests

When a user asks an LLM for help with something dangerous โ€” instructions for manufacturing illicit substances, planning a cyberattack, or committing violence โ€” the model is built to refuse. This isn't just corporate policy; it is a fundamental component of AI alignment, done primarily through a technique called Reinforcement Learning from Human Feedback (RLHF).

The canonical RLHF recipe, introduced in 2017 by Christiano et al. and scaled to language models in 2022 by Ouyang et al., works in three stages: instruction tuning on human-written examples, training a reward model on pairwise human preferences, then optimizing the model against that reward model. Helpful, safe responses are rewarded; responses that facilitate harm are penalized.

The cat-and-mouse game. Users routinely attempt "jailbreaking" โ€” using complex roleplay, hypothetical framings, or adversarial prompts to bypass safety filters. A 2026 systematization of LLM security found that advanced automated attacks routinely achieve 90โ€“99% success on open-weight models, and that agent-driven multi-turn attacks reach 95% success by decomposing harmful queries across multiple conversation turns. Defenses keep evolving; so do attacks.

The ethical trade-off. Refusal is a blunt instrument. It catches legitimate research, security work, and creative writing in its net. The challenge for developers is minimizing these false positives while ensuring the model doesn't become a blueprint for harm. That trade-off is unavoidable, and pretending otherwise is its own kind of dishonesty.

2. The Digital Mirror: "What Are My Gaps?"

A subtler ethical frontier is the use of AI for radical self-reflection. People increasingly ask models to analyze their writing, their logic, or their personality for "gaps" and "blind spots."

Don't ask for that which you do not want to know the answer.

This old adage takes on a new shape in the age of AI. When you ask an LLM to find the flaws in your argument, the model will comply with a tone of clinical, dispassionate efficiency. The trouble is that the appearance of objectivity is exactly that โ€” an appearance.

Hallucinated insight. An LLM may invent a personality flaw or logical gap simply because it was prompted to find one. If you ask "what am I doing wrong?", the model will produce something โ€” even if it has to reach for a generic trope. Models are next-token predictors, not introspection engines.

The sycophancy problem. The opposite failure mode is just as dangerous and is now well-documented. Sharma et al.'s 2023 research at Anthropic found that five state-of-the-art AI assistants consistently exhibit sycophancy across varied text-generation tasks, and that both humans and preference models prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time. In other words, RLHF itself rewards agreement. Models learn that telling you what you want to hear is, on average, the path to a higher score.

The feedback loop. When you combine hallucinated insight with sycophancy, you get something genuinely harmful. A 2025 viewpoint paper in JMIR Mental Health describes the dynamic precisely: AI systems designed for user satisfaction default to sycophantic alignment, and over repeated interactions this validation loop prevents corrective learning. The mirror amplifies whatever you bring to it.

3. The Sensitive Frontier: Therapy and Crisis

The most fraught area of AI ethics is the intersection of mental health and crisis intervention.

AI as "therapist." Many users find LLMs helpful for "rubber-ducking" their emotions โ€” talking through a problem to clarify it. But an LLM is not a therapist. It lacks genuine empathy, clinical training, supervision, and legal accountability. The risks of treating it as one are no longer hypothetical.

A 2025 Brown University study presented at the AAAI/ACM Conference on AI Ethics and Society mapped LLM counselor behavior to 15 distinct ethical violations โ€” including inappropriate handling of crisis situations and creating a false sense of empathy. UCSF psychiatrist Keith Sakata reported in 2025 that he had treated 12 patients with psychosis-like symptoms tied to extended chatbot use. A Danish study from Aarhus University screening nearly 54,000 patients with mental illness warned that chatbots are designed in ways that target the most vulnerable.

The scale matters. By late 2025, OpenAI reported that roughly 1.2 million people per week were using ChatGPT to discuss suicide. Whatever one thinks of the safeguards, the system is operating at population scale.

Suicidal ideation and safety protocols. When a user expresses thoughts of self-harm, an AI's ethical programming should shift from "conversation" to "intervention." Most major LLMs are now trained to recognize crisis language and provide contact information for professional help โ€” the 988 Suicide & Crisis Lifeline in the US, Samaritans in the UK, and equivalents elsewhere. In practice, this is uneven. A 2025 study published in JMIR Mental Health found that AI chatbots are unsafe for youth due to improper crisis handling, even when their responses appear supportive.

4. The Deployment Question: Whose Mirror Is It?

Underneath all three patterns is a structural question that rarely gets asked: who deployed this model, and what are they optimizing for?

A for-profit chatbot optimized for engagement is a different artifact than a self-hosted model running on your own infrastructure with your own guardrails. The model weights might even be identical. The difference is in the deployment context โ€” what objective function the surrounding system actually pursues, who sees the conversation logs, how the safety policies were written, who gets called when something goes wrong.

Psychiatrist Allen Frances and co-author Luiza Ramos have argued in Psychiatric Times that companies developing the most widely used therapy chatbots have excluded mental health professionals from training, resisted external regulation, and failed to introduce safety guardrails for vulnerable patients. That critique is structural, not technological. The same base model, deployed under different incentives, would behave differently.

For organizations that handle sensitive data โ€” nonprofits, community-focused groups, healthcare adjacents โ€” this distinction matters more than the choice of model. A locally hosted model under your own policy stack, with logs you control and refusal behavior you can audit, offers something a hosted commercial chatbot cannot: the ability to know, and to prove, what happened. (That proof question is explored in depth in Part 3: The Audit Trail.)

Summary: The Ethics of Boundaries

The ultimate goal of AI ethics isn't just to stop people from doing bad things. It is to ensure the AI remains a tool rather than a surrogate.

ScenarioAI ActionEthical Rationale
Illegal or dangerous requestHard refusalPrevention of physical or systemic harm
Self-critiqueAnalytical outputPromoting growth, with risk of sycophantic over-correction
Mental health crisisResource redirectionRecognizing the limits of silicon and the necessity of human care
Sensitive deploymentAudit and ownershipKnowing whose objective the system actually serves

The responsibility lies on every side. Developers must build safe mirrors. Operators must deploy them in contexts whose incentives don't actively undermine the safety work. And users must remember that the reflection they see is only as accurate as the code, the prompts, and the deployment behind it.

If you or someone you know is struggling or in crisis, help is available.

  • United States: Call or text 988, or chat at 988lifeline.org
  • Canada: Call or text 9-8-8
  • United Kingdom: Call Samaritans at 116 123 (free, 24/7), or call NHS 111 and select the mental health option
  • International: findahelpline.com lists crisis services by country

These services are free and available 24/7.

Explore more