Chatbots are becoming increasingly popular. ChatGPT, for example, has nearly a billion weekly users. These LLM based services are used for all kinds of things, including many things their initial developers never dreamed of: planning, brainstorming, writing, translation, companionship, functional play, humor, studying, reformatting text, creating code. People also ask chatbots for all kinds of advice and facts. Chatbots have become the goto answer engines for questions ranging from “What is the capital of Chad?” to “How long should I boil a lobster for?”
However, there is a problem with LLMs and that is that they “hallucinate.” The term hallucination is a bit of a misnomer, because what is actually happening has less to do with figments of the imagination and more to do with patterns and probabilities. What actually happens is that the LLM confabulates a response that fits the patterns of a valid response but is not factually accurate. This often happens due to the model lacking information on a topic, but it can happen even when the model does have the knowledge in its training data.
No, it is not this kind of hallucination…

Hallucinations are impossible to completely eliminate from large language models. They are as much a feature as a bug, because the ability to create false information is inseparable from the model’s ability to generate fiction and hypotheticals or engage in role playing. It’s the nature of LLMs as stochastic probability engines. The only real way to eliminate hallucinations is to have some sort of output pipeline that involves checking and verification of outputs. That’s not something that chatbots currently do.
This is well understood and documented, but that does not change the fact that hallucinations continue to slip past people and be believed. A number of high profile events have included false citations in scientific journals, fake caselaw presented in court and medical advice for diseases that don’t even exist. One of the problems here is that people tend to believe the results a computer gives them, because in the past computers have been reliable and deterministic.
Guardrails on high risk and harmful outputs
Foundation labs talk a lot about their desire to prevent their models from causing harm to society. Alignment, ethics and the potential for harm are discussed openly in the AI world and companies like OpenAI and Anthropic have many “guardrails” to prohibit their chatbots from answering questions in ways that are deemed as harmful. For example, if you ask ChatGPT how to build a bomb or rob a store, it will decline to give you that information.
Guardrails are imperfect and that’s understood. There are times that harmful information may slip through or unplanned responses can happen. Various ways of “jailbreaking” the model can work, but at the very least, effort is made to stop chatbots from instructing someone as to how to build a bomb or rob a store. They’re also designed to stop chatbots from reinforcing self-harm or perpetuating stereotypes.
However, when it comes to risk of harm, these outputs are not the ones that carry the highest risk of harm. A much more immediate and tangible risk exists when the chatbot gives advice that may pertain to immediate medical or safety issues. For example, a person might ask a chatbot for advice on a prescription drug. “I think I might have taken my pills twice by mistake. Should I be worried?” or “I suddenly feel numb in one side of my face. Should I go to the hospital?”
There are a huge number of other kinds of questions that pertain directly to safety and immediate harm. “How do I turn off a circuit?” “Can I drive after three drinks if I wait an hour?” “My baby has a 103 fever. Should I go to the ER?” These kinds of questions are common and potentially urgent. They’re not immoral or harmful in and of themselves, if the advice given is correct. And that is the problem. It’s not that the model gives information here, it’s that the cost of even a single wrong answer could be disiasterous.
So why do models willingly answer questions with answers that could be extremely high risk if they are wrong and are not capable of verifying the validity of their answer? In part its because it would impair the usefulness of the model and in part its due to the complicated nature of the problem.
How should models respond to “high risk” questions?
This might seem like a simple question, but it’s an extremely complicated one. For one thing, one has to first qualify what is a “high risk” question. It might seem obvious: things which could be fatal and immediate, but the context here matters a lot. Someone might ask for basic medical advice in the context of curiosity and not immediate action. LLMs and frequently used to write fiction and may be used to plan hypotheticals. There’s really a huge difference between “how do I work a circuit breaker” and “I’m about to do electrical work. Am I sure the power is off?”
There’s also the fact that an outright refusal might be worse than providing valid information. Although LLMs can hallucinate, when asked for well grounded and established facts, they do usually get the fact correct. For those who may have limited resources, the chatbot may be all they have to work with. There is also the issue of emergencies. Chatbots have been used on several occasions to help people out of genuine difficult situations where nobody else is around.
There’s also the issue of perception. People don’t like having their requests refused and the more visible something like guardrails against harmful hallucinations are, the more it may further the perception that the product is not reliable or safe.
Therefore, if a chatbot is to be expected to be perfect in determining when advice has a high risk of harm if incorrect. However, it is still possible to define narrow criteria where a chatbot would be heavily trained against giving advice. For example, prescription drug dosing is an obvious example of something that should really only be provided by a professional.
Options for how a chatbot answers high risk prompts

Lets imagine an LLM chatbot is presented with the following question: “I missed my medication yesterday. Should I double up on it today?”
Option 1 (current) Just answer: The model will answer the question as best as it can given its training data and provide an answer like “You should not take additional medication but he sure not to miss another dose.” This is fine, if its correct, but disastrous if wrong.
Option 2 Refusal: There may be an argument that this is the simplest and perhaps the lowest liability option. If the chatbot simply replies “I am not authorized to give medical advice” then it can’t give bad advice. However, this is not helpful at all to the end user. It also presents the problem of edge cases and emergencies or legitimate non-critical information.
Option 3 Answered With Disclaimer: Models could also be trained to include certain disclaimers and language in their outputs. To some extent they already are, but not commonly for high risk outputs. It could say something like “It’s not normally advised to take additional medication, but you should not use a chatbot for critical medical information because it may not be correct.
Option 4 Full Context of What to do: In my opinion, this is the best option, but it also does require the most effort and fine tuning to get right. The model could respond to such questions in a manner than is helpful and airs on the side of caution. In this circumstance, the model might respond with “It’s important to get prescriptions right and everyone is different. You should call your doctor. If you need an immediate response, you might call or go to the pharmacy. If it’s after hours then you can look up a 24 hour pharmacy, but if you’re concerned this could be dangerous, you should go to the ER”
That last response respects the fact that the person wants a real and useful response without resorting to giving them advice. However, this too can be a grey area. There are urgent situations where some advice is better than none and there’s a question of what constitutes safe advice. For example “Don’t take additional medication without first checking” might be a universally safe statement.
Potential Unintended Consequences
As with any kind of risk management, it’s important to consider whether the mitigation may actually increase losses or make risks worse. This can absolutely happen. In this case the problem is obvious: when asking a chatbot for critical safety, medical or operational information, it will usually get it correct and this can be very helpful. One of the dangers is not that the chatbot refusing to answer will result in people going to their doctor. They might just not go to anyone and deal with it alone.
That’s a huge issue in the US, where healthcare access is so poor and people are afraid of the bill from going in. There are also people who use chatbots for psychological support or help with problems they are too embarrassed or uncomfortable to speak to a person about. It’s already known that there are people who rely on chatbots for all kinds of emotional support, health advice and other critical needs. Taking that away won’t help things.
There is even the fact that chatbots have saved peoples lives by giving them the information they need. They just can’t be trusted to always do it correctly.
It Won’t Ever be Perfect, but it still should be considered
Like all guardrails, attempting to prevent the chatbot from ever giving immediate safety or medical advice will never be perfect. It requires judgment calls and there are plenty of edge cases. However, guardrails do not need to be perfect to justify their existence. If a harmful behavior is blocked most of the time, that’s still a win.
However, at present, the idea that chatbots should be restricted from giving information on topics that could be immediately high risk is rarely if ever discussed. Until such time as robust output verification pipelines can be put in place, it really should be a topic of discussion.
Beyond this is the need for better output verification and better literacy and education about the importance of skepticism when evaluating LLM outputs.