Should Chatbots Refuse to Give High Risk Advice?

Chatbots are becoming increasingly popular. ChatGPT, for example, has nearly a billion weekly users. These LLM based services are used for all kinds of things, including many things their initial developers never dreamed of: planning, brainstorming, writing, translation, companionship, functional play, humor, studying, reformatting text, creating code. People also ask chatbots for all kinds of advice and facts. Chatbots have become the goto answer engines for questions ranging from “What is the capital of Chad?” to “How long should I boil a lobster for?”

However, there is a problem with LLMs and that is that they “hallucinate.” The term hallucination is a bit of a misnomer, because what is actually happening has less to do with figments of the imagination and more to do with patterns and probabilities. What actually happens is that the LLM confabulates a response that fits the patterns of a valid response but is not factually accurate. This often happens due to the model lacking information on a topic, but it can happen even when the model does have the knowledge in its training data.

No, it is not this kind of hallucination…

Hallucinations are impossible to completely eliminate from large language models. They are as much a feature as a bug, because the ability to create false information is inseparable from the model’s ability to generate fiction and hypotheticals or engage in role playing. It’s the nature of LLMs as stochastic probability engines. The only real way to eliminate hallucinations is to have some sort of output pipeline that involves checking and verification of outputs. That’s not something that chatbots currently do.

This is well understood and documented, but that does not change the fact that hallucinations continue to slip past people and be believed. A number of high profile events have included false citations in scientific journals, fake caselaw presented in court and medical advice for diseases that don’t even exist. One of the problems here is that people tend to believe the results a computer gives them, because in the past computers have been reliable and deterministic.

Continue reading

A Risk-Oriented Hierarchy of Intervention in the Deployment and Customization of Large Language Models

A practical and pragmatic discussion of the levels of risk and complexity in the customization of large language models. Many organizations are using LLM technology to build customized chatbots, RAG tools and content generators. However, many organizations do not have a full understanding of the options and levels of risk and development complexity that come from LLM customization and deployment.

In the contemporary landscape of artificial intelligence deployment, a structural shift is occurring: base models are becoming increasingly capable out of the box. Instruction-following performance, contextual reasoning, retrieval integration, and domain adaptability have improved to such a degree that many historical justifications for invasive model modification are steadily eroding. This evolution necessitates a corresponding philosophical and governance framework—one grounded in the principle that greater customization introduces greater uncertainty, greater liability, and a proportionally greater need for validation and risk controls.

At its core, the responsible deployment of large language models should be guided by a hierarchy of invasiveness. Each successive layer of intervention introduces deeper system coupling, increased behavioral unpredictability, and escalating regulatory, operational, and reputational risk. Accordingly, risk management should not begin at the level of model alteration, but rather at the least invasive layers of interaction and configuration.

Continue reading