ChatGPT doesn't just use its training data. Here's how it actually decides what to look up

One of the most common questions we get when talking about GEO is this: “How can it work if OpenAI trains models only once in a few months?”

It makes sense. If you’re not part of the training data, ChatGPT can’t reference you. And any effort you put in can only show effects after months, when the model updates.

Here’s the thing though. That’s only partially true.

ChatGPT doesn’t only use its knowledge base. It makes a real-time decision about whether to search the web for each query. And it makes that decision in a specific way.

From what we can see in the network responses, it calculates three probabilities in order:

Can the response be given without any search?
Does the response need a complex, multi-step search?
Does the response need a simple search?

Here are real examples of how this plays out.

If you ask “What is faux wool?” it doesn’t trigger a search at all. That’s stable information. It just answers from its knowledge base.

If you ask “What’s the best winter coat for men under 300 dollars for both formal and casual use?” it triggers a simple search. Prices and product availability change, so it goes and looks.

If you ask “Get me the top coats under 100 dollars, compare their features in detail and find any coupons and discounts active on those merchants” it triggers the thinking model with a complex multi-step search, pulling from pages updated in the last 30 days.

Here’s the part that I find really interesting. For that third type of query, it pre-selects brands from its knowledge base before it even starts searching. It does this to minimize the steps to get to a good response. In the coat example, it pre-selected Nordstrom, Macy’s, and DKNY before it ran a single search.

So what does this mean for GEO?

For pure knowledge queries like “tell me about Brand X” you do need to wait for model updates. That part of the objection is correct.

But for purchase intent queries and anything where recency matters, you don’t have to wait. That content can be surfaced in real time.

To get into the research-style responses where complex searches happen, you need brand-level content that covers a broad range of user needs. Budget, trends, occasions, comparisons. Brands that address many dimensions of the user’s world get pre-selected before the search even starts.

The training data matters. But it’s not the whole game.

ChatGPT doesn't just use its training data. Here's how it actually decides what to look up

Read next

Atlas Has a Memory. That Means GEO Is No Longer Stateless.

ClawdBot is a glimpse into the future, but not ready for prime time

Want to chat about AI?