On Reasoning, Explainability and LLMs

Duncan Anderson
11 min readFeb 3, 2024

What follows is an essay on a topic that I’ve been thinking about for a while now. I hope you find my words thought provoking and a counter balance to some of the more hysterical AI commentary. Reasoning and explainability are topics full of nuance — I hope this comes across in this essay.

When GPT-4 was released last year I noticed one particular conversation develop: “explainable AI”. GPT-4 was the first AI model that showed real advancement in the field of reasoning. To some of us that was exciting, but it also threatened some of those who’d been making a living with more traditional decision making technologies. Explainability has been held up as a barrier to the adoption of models like GPT-4.

In some fields, e.g. healthcare or financial services, it can be especially important to explain why a particular decision has been taken. It therefore follows that we need to understand why an AI has taken those decisions, hence explainable AI.

Before I respond to this challenge, it’s worth taking a moment to consider how an LLM works and how it comes to be able to make decisions.

An LLM works by predicting the most statistically probable next token in a sequence. Thus when I ask it “who is the president of the USA?”, the model does not perform some form of structured reasoning and database lookup to find Joe Biden’s name. Instead, it knows from its training data that Joe Biden is a statistically probable sequence of tokens that could be produced to complete the input “who is the president of the USA?”. The fact that an LLM has read a very, very large (hence the first L in LLM) amount of text means that it’s able to perform this trick for a very wide variety of inputs.

Critics of LLMs might finish here and point out that such a model is not reasoning in any meaningful sense. They’d also say that it’s not “explainable”, because its answers come from a giant statistical machine. To explain why the model came up with Joe Biden would entail an understanding of the many billions of parameters in the model — something that’s clearly impractical and impossible for any human.

However, to finish the discussion at this point would be a mistake and a wilful disregard for what LLMs actually represent.

Let’s take a detour into the world of science…

In the world of science there are two explanations for how to examine and understand the properties of a system. The first, reductionism, interprets a complex system by examining its constituent parts. Thus, a reductionist sees the world as simply an extension of the behaviour of building blocks like atoms, molecules, chemical reactions and physical interactions. If you understand the basics, everything else is just a bigger version of that.

Reductionism is how most of us tend to think about most things — it’s a very logical way to think about complex systems and is mostly the default for human thinking.

A reductionist’s analysis of an LLM sees it simply in terms of its ability to predict the most statistically probable next token. By definition, LLMs cannot reason and any evidence that they can is just an illusion brought on by the large training set. It’s a fancy party trick.

However, I’m not sure I buy the reductionist angle when looking at LLMs. For me, it doesn’t fully explain some of what we see happening.

However, reductionism is not the only way to analyse complex systems. In fact, reductionism cannot explain much of science and how we actually experience the world.

Take salt, which we all know to be a combination of Sodium and Chloride atoms (NaCI). Sodium is a metal that reacts explosively with water, whilst Chloride is a poisonous gas. And yet, when we combine them, we get an edible crystalline structure with a distinctive taste. To my knowledge, salt is not known for its explosive properties when in the presence of water and salt is not especially poisonous. Reductionism cannot explain why salt is so dramatically different from its constituent parts. Nothing about studying the properties of Sodium or Chloride tells us anything about salt.

To understand salt we need a different way of thinking. That different way is known as emergence.

Emergence predicts that as complex systems become more complex, they frequently take on properties and behaviours we cannot predict by looking at their constituent parts — as nicely explained by the wikipedia article on the topic.

“In philosophy, systems theory, science, and art, emergence occurs when a complex entity has properties or behaviors that its parts do not have on their own, and emerge only when they interact in a wider whole. Emergence plays a central role in theories of integrative levels and of complex systems. For instance, the phenomenon of life as studied in biology is an emergent property of chemistry and quantum physics.”

That last sentence appears to me to be significant. “the phenomenon of life… is an emergent property of chemistry and quantum physics.” Indeed. If we only study the chemistry of the human body, it’s unlikely that we would predict intelligent life.

The concept of “emergence” originates from a paper published back in 1972 titled “More Is Different” by Nobel prize-winning physicist Philip Anderson (no relation).

“The behaviour of large and complex aggregates of elementary particles, it turns out, is not to be understood in terms of a simple extrapolation of the properties of a few particles. Instead, at each level of complexity entirely new properties appear…”

A good example of this is included in a recent New Scientist article, Emergence: The mysterious concept that holds the key to consciousness.

“The next time you get caught in a downpour, don’t think about how wet you are getting — but how you are getting wet. Rain is, after all, just molecules composed of hydrogen and oxygen atoms, and there is nothing wet about hydrogen or oxygen on their own. There isn’t even anything wet about a single water molecule. Put lots of them together in the right conditions, however, and you will get wet. The wetness of water is an example of an “emergent” property: a phenomenon that can’t be explained by the fundamental properties of something’s constituent parts, but rather manifests only when those parts are extremely numerous.”

If we apply emergence to LLMs, the ability of a model to start to reason becomes less surprising. It’s simply an emergent ability that we cannot predict by thinking only about next token prediction.

The concept of emergence in LLMs first came to my attention when I read the research paper “Emergent abilities of Large Language Models”.

“This paper… discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.”

The paper introduced four emergent abilities that the authors had detected: multi-step reasoning, instruction following, program execution and model calibration.

Let’s take one of those emergent abilities: multi-step reasoning.

This GPT-4 user entered their dog’s symptoms and blood test results into GPT-4, which then correctly identified the dog’s ailment that the vet had been unable to do.

Here’s a part of that prompt.

That GPT-4 was able to understand this and compare that blood test result with a subsequent one and identify the possible issue, does not appear to be explained by “it’s just outputting the most statistically probable sequence of tokens”. It feels like there’s more going on — an emergent ability.

The sick dog is far from an isolated example. I have multiple real client situations where I’ve used LLMs as reasoners. It feels weird, but I’m comforted that science gives us a reference point that helps to explain it. Perhaps it shouldn’t surprise us as much as it does.

Let’s consider and contrast how an LLM performs reasoning and how other technologies do it.

If I wanted to get a rules-based non-AI system to perform some form of reasoning, I’d have to define a set of hard-coded rules to embody how the system should behave. If the situation is complex, those rule definitions can quickly become labyrinthine and so subject to human error. In other words, the resultant system might sometimes produce the wrong answer not because it itself is fallible, but because its human programmers struggled to encode the rules to be followed in the “language” of the system.

In contrast, an LLM does not require any such formal rules and is instead given direction through a combination of natural language training material and prompting. The definitions for the reasoning are often less precise because they are in natural human language, which frequently embodies ambiguities. However, it’s quite possible to prompt an LLM with a more formal way — they are remarkably adept at understanding computer code, for example.

LLMs reason about things in ways that rules based systems cannot — they are able to understand and take account of the randomness of real life. V-1 of the chatbot revolution was roundly criticised precisely because their rules-based definitions were incapable of bending to the uniqueness of the real life situations they were inevitably presented with. “Computer says no” wasn’t what people expected of AI.

Today, LLMs are often more fallible than we’d like, but they are also more able to bend to real life. That’s not a set of attributes you want when running a nuclear power station, but there’s plenty of other situations where it is.

Both AI and rules-based systems are subject to fallibility for different reasons. One is inherently fallible, the other frequently embodies enough complexity that humans make mistakes when building it. To give some context, my friend Chris Williams at Databricks recently wrote:

“we already have unpredictable actors in our business processes that we have to manage. They’re called people.”

An LLM’s ability to reason feels weird, but perhaps it’s not so distant from our own — both can only really be explained through emergence.

Today, LLMs reason about relatively simple things, but multi-step reasoning is already a thing — I built a broadband troubleshooter app that followed the natural language process across multiple steps. If the user deviated it was able to maintain the conversation, but gently usher them back to the intended process.

LLM technology is improving at a break neck pace and one day soon we’ll see LLMs being used in more complex scenarios — executing full business processes and advising on complex topics like tax calculations. When trained on niche topics, like medicine, LLMs are already starting to challenge human levels of accuracy. A recent paper revealed that an LLM, when used alone, exceeded the performance of unassisted clinicians.

Another paper comes to a startling conclusion about the role of today’s LLMs in clinical decision making.

“No real benchmarks exist, but we estimate this performance to be at the level of someone who has just graduated from medical school, such as an intern or resident. This tells us that LLMs in general have the potential to be an augmenting tool for the practice of medicine and support clinical decision making with impressive accuracy.”

Of course medicine is an especially sensitive topic and there are many ethical dilemmas, together with a spread of opinions. Nevertheless, that a possible clinical role for LLMs is being seriously considered so early in their evolution is remarkable.

So, if reasoning in LLMs is a thing, how do we address explainability, or the lack of it?

Explainability in a traditional system occurs because we can go back to the encoded rules, follow a path through, and determine why a particular answer resulted. It might not be much fun, but to someone with an eye for detail it’s eminently possible. This ability provides a level of confidence that “the system is working as designed”. Of course the design may be wrong, or the team may not even have correctly understood the requirements of the design, but “working as designed” is something that’s familiar in the business world and lends itself to contracts and legal liability. It’s how the world of IT has worked since its inception.

In contrast, explainability in an LLM is very different. There are no neat rules to trace through and the giant statistical model that makes up an LLM isn’t something a human can open up the hood of to trace how a particular answer was arrived at. However, that doesn’t mean that explainability doesn’t exist. In fact, it very much does exist, but in a very different form.

This is how explainability in LLMs works: I simply ask the machine to make a decision and explain its reasoning. That’s exactly what I would do if I asked a human. Humans are considered explainable, so I contend that LLMs are as well.

As an example, consider the following interaction with GPT-4:

Q: What weighs more — 1 kg of feathers, or 0.5 kg of steel. Please explain your reasoning and how this relates to the nature of the two materials.

A: 1 kg of feathers weighs more than 0.5 kg of steel.

The reason is straightforward: the weight of an object is a measure of the gravitational force acting on it, and this force is directly proportional to the object’s mass.

Since 1 kilogram is twice as much mass as 0.5 kilograms, the 1 kg of feathers will weigh more than the 0.5 kg of steel, regardless of the material’s nature.

However, the difference in weight between these two materials also highlights an interesting aspect of their nature: density. Density is the mass of a substance per unit volume.

Steel is much denser than feathers, meaning that a given volume of steel will weigh much more than the same volume of feathers. Therefore, while 1 kg of feathers will weigh more due to its greater mass, it will occupy a much larger volume than 0.5 kg of steel. This difference in density is why materials like steel are used for construction and machinery for their strength and compactness, while feathers find uses in applications requiring lightness and insulation.

GPT-4’s explanation of its reasoning is detailed and methodical. In fact, it’s more detailed and methodical than most humans would likely provide. It’s also more understandable than tracing through a complex set of rules that attempted to encode the laws of physics.

LLMs might well be “black boxy” because we struggle to understand how they work when we’re in reductionist thinking mode. However, when we accept emergence, they are paradoxically the most explainable IT systems we’ve ever had.

OpenAI are rumoured to have a project named Q* that has supposedly made a breakthrough in teaching language models how to do maths. We are talking rumour and conjecture here, so bear with me… but I think this is an important avenue, whether or not OpenAI have achieved the alleged breakthrough.

We tend to think of maths as the archetype of a subject that’s based entirely on theory. There are few who would consider that mathematicians gain their skill any way other than through a solid grounding in the basics, on which everything else is based. It’s very reductionist.

The logic follows that to teach a machine to do maths requires it to have an understanding of mathematical principles. And yet, most of us learn our times tables by repetition. We learn that 7 x 6 = 42 not by anything other than repeating that 7 x 6 = 42 until such time as we know it off by heart. When, as school children, we put our hand up in class and answered 42 we were not performing a calculation in our heads, but instead recalling our training data.

Doesn’t that sound a little familiar?

--

--

Duncan Anderson

Eclectic tastes, amateur at most things. Learning how to build a new startup. Former CTO for IBM Watson Europe.