Should Chatbots Refuse to Give High Risk Advice?

Chatbots are becoming increasingly popular. ChatGPT, for example, has nearly a billion weekly users. These LLM based services are used for all kinds of things, including many things their initial developers never dreamed of: planning, brainstorming, writing, translation, companionship, functional play, humor, studying, reformatting text, creating code. People also ask chatbots for all kinds of advice and facts. Chatbots have become the goto answer engines for questions ranging from “What is the capital of Chad?” to “How long should I boil a lobster for?”

However, there is a problem with LLMs and that is that they “hallucinate.” The term hallucination is a bit of a misnomer, because what is actually happening has less to do with figments of the imagination and more to do with patterns and probabilities. What actually happens is that the LLM confabulates a response that fits the patterns of a valid response but is not factually accurate. This often happens due to the model lacking information on a topic, but it can happen even when the model does have the knowledge in its training data.

No, it is not this kind of hallucination…

Hallucinations are impossible to completely eliminate from large language models. They are as much a feature as a bug, because the ability to create false information is inseparable from the model’s ability to generate fiction and hypotheticals or engage in role playing. It’s the nature of LLMs as stochastic probability engines. The only real way to eliminate hallucinations is to have some sort of output pipeline that involves checking and verification of outputs. That’s not something that chatbots currently do.

This is well understood and documented, but that does not change the fact that hallucinations continue to slip past people and be believed. A number of high profile events have included false citations in scientific journals, fake caselaw presented in court and medical advice for diseases that don’t even exist. One of the problems here is that people tend to believe the results a computer gives them, because in the past computers have been reliable and deterministic.

Continue reading

Where We Really Stand In AI Capabilities

The recent talk of AGI, as if it is some kind of impending certainty, and now talk about “Superintelligence” is really causing a great deal of confusion. The reality is that we are nowhere near the point of human level intelligence in all domains, the idea of artificial super intelligence, is entirely speculative and nowhere near foreseeable capabilities, and you can’t scale past the limits of current AI systems. The truth has been lost in a sea of sensational rhetoric.

The modern public discourse around artificial intelligence began with a fundamental shift in frame of reference. For decades, AI systems were narrow, technical, and largely invisible to the general public. Then, quite suddenly, natural language processing systems emerged with startling fluency. For the first time, people could interact with a machine through conversational language that resembled human dialogue.

This single development reset public intuition overnight.

Instead of being understood as statistical systems operating within defined computational constraints, large language models were immediately interpreted through the lens of science fiction archetypes: conversational minds, digital assistants, synthetic intellects. The resemblance in surface behavior was compelling enough to override the underlying reality of how these systems actually function.

But fluency is not cognition. Simulation of reasoning is not reasoning itself.

Continue reading

A Risk-Oriented Hierarchy of Intervention in the Deployment and Customization of Large Language Models

A practical and pragmatic discussion of the levels of risk and complexity in the customization of large language models. Many organizations are using LLM technology to build customized chatbots, RAG tools and content generators. However, many organizations do not have a full understanding of the options and levels of risk and development complexity that come from LLM customization and deployment.

In the contemporary landscape of artificial intelligence deployment, a structural shift is occurring: base models are becoming increasingly capable out of the box. Instruction-following performance, contextual reasoning, retrieval integration, and domain adaptability have improved to such a degree that many historical justifications for invasive model modification are steadily eroding. This evolution necessitates a corresponding philosophical and governance framework—one grounded in the principle that greater customization introduces greater uncertainty, greater liability, and a proportionally greater need for validation and risk controls.

At its core, the responsible deployment of large language models should be guided by a hierarchy of invasiveness. Each successive layer of intervention introduces deeper system coupling, increased behavioral unpredictability, and escalating regulatory, operational, and reputational risk. Accordingly, risk management should not begin at the level of model alteration, but rather at the least invasive layers of interaction and configuration.

Continue reading