How AI models work

The few ideas that explain everything else: what a language model does, the context window, compaction, why a model can't do everything, and how Multilo picks one automatically when yours can't fit a request.

Behind every agent and every chat reply is a language model— a system that reads text and produces text. You don’t need to know how one is built to use Multilo well, but a few ideas explain why results vary, why context matters so much, and why Multilo sometimes picks a different model than the one you selected.

What a model does

A model takes everything it has been shown — your message, your document, your sources, the conversation so far — and predicts the most useful text to come next. That single mechanism produces explanations, edits to your draft, a drafted section, or a request to use a tool. Three things follow from how models work, and they’re worth keeping in mind:

Results aren’t identical every time.The same prompt can produce slightly different wording on two runs. That’s expected, not a bug — re-running a step is a legitimate way to get a better result.
Quality follows context. A model can only reason about what it has been shown. A specific request that points at the right section and the right sources beats a vague one every time.
Models have a knowledge cutoff.A model was trained up to a point in time and doesn’t know what happened after it. For anything current — or anything specific to your work — Multilo grounds the model in your library and filesrather than trusting the model’s memory.

Models don’t act — the agent does

A model never edits your file or searches your project by itself. It can only produce text. When it needs to act, it writes a request — “read this section,” “search the library,” “apply this edit” — and Multilo’s agent loop carries it out with a real tool, shows you the result, and feeds it back to the model for the next step. This is why agent actions are reviewable and why an agent can be stopped at any point: the model proposes, the agent executes, and you stay in control.

The context window

The context window is the total amount of text a model can read in a single request — everything outside it is invisible to the model. In Multilo, the window is assembled for you from:

the system instructions that define how the agent behaves;
your voice and integrity rules and any per-project instructions in your .multilo/ folder;
your current message and the conversation so far;
the active document and whatever you have selected;
files and sources you @-mention or attach;
the passages an agent retrieves from your library and the outputs of any tools it runs.

Different models have different window sizes. A model with a long contextcan hold a whole long document and a large library at once; a smaller one can’t. The model picker flags this — see below.

Point at the right thing

Because the model only sees what’s in the window, naming the exact section or source you mean gives noticeably better results than a broad ask. Adding relevant context helps; padding the window with everything does not.

Compaction & the token budget

A long agent run can generate a lot of text — searches, file reads, tool outputs — and that text competes for room in the window. Multilo handles this for you with compaction: as a run goes on, it automatically condenses stale tool results and older context that the model has already used, so the session keeps its thread without overflowing the window. You don’t have to start over to keep going.

You also have a per-task token budget in Settings → AI. It caps how much an agent spends on a single task; when a run reaches the cap, the agent pauses and askswhether to continue rather than silently spending more. It’s a simple way to keep long autonomous runs predictable.

Not every model can do everything

Models differ in what they can do, not just in how good they are. The model picker shows a short badge on each one so you can choose at a glance:

Badge	What it means
Agent	Can use tools and run full multi-step agent sessions — read, search, and edit your project.
Tools	Can use tools for individual chat actions, but isn’t a fit for the longest autonomous runs.
Chat only	Answers and writes, but can’t take actions; agent work falls back to a text-only reply.
Long ctx	Has a large context window — good for whole long documents and big libraries.
Fast · Flagship	The capability tier — fast and economical, or top-tier reasoning. See Choosing a model.

Automatic model selection

Sometimes the model you’ve picked can’t handle a particular request — it needs a tool that model doesn’t support, or the prompt is larger than its context window. Rather than fail or quietly produce a worse answer, Multilo routes that turn to a capable model you already have access to, and tells you it did. Your selection stays put for the next turn; only the turn that didn’t fit is re-routed.

Auto selection always respects your plan— it only ever chooses from the models available to you, so it never reaches for a model you can’t use.

You’re always in control

Picking a model yourself takes priority. Auto selection only steps in when your chosen model genuinely can’t do the requested work, and it surfaces the swap so there are no surprises.

Reasoning & how deep a model goes

Some models can reason — work through a problem in steps before answering — which helps on hard tasks like critiquing a methodology, planning a long draft, or untangling a tricky argument. Deeper reasoning costs more credits and takes a little longer, so it’s worth saving for the work that needs it.

You control depth with two independent levers:

The model tier — a more capable tier reasons better. See Choosing a model.
An agent’s MODE — how thorough a given agent is on a run, from a light pass to a deep one, independent of the model. See Agents overview.

They multiply: pair a flagship model with a deep MODE for the hardest work, or a fast model with a light MODE for a quick pass. Most everyday writing sits comfortably in between.