Back to home
Mar 26, 2026

ChatGPT doesn't just use its training data. Here's how it actually decides what to look up

Mehul Jain

Mehul Jain

AI Expert & Founder

One of the most common questions we get when talking about GEO is this: “How can it work if OpenAI trains models only once in a few months?”

It makes sense. If you’re not part of the training data, ChatGPT can’t reference you. And any effort you put in can only show effects after months, when the model updates.

Here’s the thing though. That’s only partially true.

ChatGPT doesn’t only use its knowledge base. It makes a real-time decision about whether to search the web for each query. And it makes that decision in a specific way.

From what we can see in the network responses, it calculates three probabilities in order:

  1. Can the response be given without any search?
  2. Does the response need a complex, multi-step search?
  3. Does the response need a simple search?

Here are real examples of how this plays out.

If you ask “What is faux wool?” it doesn’t trigger a search at all. That’s stable information. It just answers from its knowledge base.

If you ask “What’s the best winter coat for men under 300 dollars for both formal and casual use?” it triggers a simple search. Prices and product availability change, so it goes and looks.

If you ask “Get me the top coats under 100 dollars, compare their features in detail and find any coupons and discounts active on those merchants” it triggers the thinking model with a complex multi-step search, pulling from pages updated in the last 30 days.

Here’s the part that I find really interesting. For that third type of query, it pre-selects brands from its knowledge base before it even starts searching. It does this to minimize the steps to get to a good response. In the coat example, it pre-selected Nordstrom, Macy’s, and DKNY before it ran a single search.

So what does this mean for GEO?

For pure knowledge queries like “tell me about Brand X” you do need to wait for model updates. That part of the objection is correct.

But for purchase intent queries and anything where recency matters, you don’t have to wait. That content can be surfaced in real time.

To get into the research-style responses where complex searches happen, you need brand-level content that covers a broad range of user needs. Budget, trends, occasions, comparisons. Brands that address many dimensions of the user’s world get pre-selected before the search even starts.

The training data matters. But it’s not the whole game.

Thanks for reading!

Share article

Want to chat about AI?

I'm always up for a conversation about AI, building products, or just nerding out.