What is an LLM — and why does it sometimes hallucinate?

How an AI language model is built, why it sometimes makes things up – and what that means for your day-to-day work.

Download the free cheatsheet:

Download the cheat sheet

Subscribe to the newsletter

Claude, ChatGPT, Gemini: They all work on the same principle. If you understand how a Large Language Model (LLM) is built, you can use it more deliberately – and know when you should check the output.

How an LLM "learns"

An LLM is trained on billions of texts: websites, academic papers, books, code, YouTube transcripts. Google had a historic advantage through Google Books – 22 million digitised books as a training base. Academic papers also fed in, especially a lot in code.

This training is not memorisation. The model learns patterns – statistical relationships between words, concepts and structures. This makes it strong in areas that are well documented: language, contract logic, structured texts, code.

After training comes a safety step: testing, setting guardrails, restricting behaviour. Only then is a new version made public.

Important: Every model has a cut-off date. It only knows events after that if it can look them up live via tools (e.g. web search).

Predict words, not think

LLMs generate answers token by token. A token corresponds to roughly 0.75 words or 4 characters. At each step, the model asks: Which word is most likely after this context?

This is not intelligence in the human sense – but very good pattern recognition at a very high level. Language is handled well. So are calculations. But much of the brain remains unexplained. AGI, or general artificial intelligence, is unrealistic in the near future – and is used by large providers as a marketing term to convince investors.

Practical consequence: if you want a text to be exactly 30 characters long, that works badly. The model does not count characters internally, but tokens. Better: "Write about 50 words" – then you end up between 45 and 55.

When LLMs invent things

Hallucinations are the best-known problem. The model invents facts and states them confidently – as if they were true.

Why does this happen? LLMs are trained to provide an answer. If the knowledge is missing, the model fills the gaps. It does not recognise that it "does not know" – it only recognises the next most likely token.

The most common causes:

Prompt formulated too vaguely
Too little context provided
Outdated training knowledge – the model does not know events after the cut-off date

What helps:

More precise request instead of a broad question
Provide context: quote, minutes, plan documents as PDF
State in the instructions: "If unsure, ask explicitly. Do not invent numbers"
For current topics, force the LLM to use web search – or use the Deep Research function for more complex research
Always check important results

Hallucinations are not a sign of a bad model – they are a hint that the prompt or context was too thin.

The context window

The context window is the chatbot's short-term memory. Everything visible in the current conversation belongs to it: your questions, the answers, uploaded documents, project instructions.

The size has grown massively over the years:

2022 (first ChatGPT version): approx. 4'096 tokens
2026 (Claude or Gemini): up to 1,000,000 tokens – that is about 5 novels

What that means:

Images and screenshots use many tokens – use them sparingly
Very long chats: the LLM eventually forgets the beginning. Repeat important points if needed.
New chat = empty memory (except project context and instructions that are automatically reloaded)

And an often underestimated point: More context is not always better. Irrelevant PDFs, old chats, unrelated information confuse the model. Only upload what is relevant for this specific task.

More important is the format of the context files. A 1 MB PDF contains images, formatting and metadata – that eats tokens and can slow the model down. The same content as clean Markdown often needs only 3–5 KB. So it is worth converting PDFs once into Markdown and then uploading the Markdown for each chat request instead of the original PDF.

What you should never enter

Even with a Pro licence (no training on your chats), the rule applies: certain data do not belong in a cloud AI.

NDA content
Personal data of sensitive people
Bank details and credit card information
Login credentials and API keys
Secret contracts or patents before filing

For such data there are two alternatives: Mistral as a European provider (servers in France, closer to GDPR) – or a local open source LLM like Llama or Gemma via Ollama / LM Studio. Full data control, no cloud upload, weaker performance.

How an LLM request is built

When you ask a question, the model sees more than just your question. In the background, a large prompt is assembled:

Component	What it is
User Prompt	Your actual question
System Prompts	The provider's specifications (guardrails, behaviour, tool use, output style)
Instructions	Your personal standing instructions
Context	Uploaded documents, project files, chat history
General knowledge	What the model knew up to the training cut-off
Tools	Web search, calculator, connectors, image generation

This structure explains why two people using the same model can get very different results. Instructions and context make the difference – and that can be influenced.

What comes next

Now you know what an LLM is and where its limits are. In the next article, we will look at which model is the better choice when: Claude, ChatGPT, Gemini or Mistral – and when a local model makes sense.