When a human searches Google, they might scan a few links on the first page. When a Large Language Model (LLM) like ChatGPT or Google's AI Overviews is prompted, it dives into an invisible ocean of information. To construct a single, synthesized answer, it consumes a massive volume of data that dwarfs human capability. Hop AI's internal research on Generative Engine Optimization (GEO) confirms that an LLM analyzes hundreds of search results, often going dozens of pages deep into Google to cross-reference facts, evaluate sources, and construct the most accurate and comprehensive response possible. This fundamental difference marks a new era in digital information, shifting the goal from simply ranking to becoming a trusted source for the AI itself.
Unlike a human user who rarely ventures past the first page of Google, a Large Language Model (LLM) consults a vastly larger set of sources. When generating an answer, an LLM can analyze hundreds of search results, going dozens of pages deep into the search engine results pages (SERPs). Hop AI's internal research on Generative Engine Optimization (GEO) shows that models like ChatGPT and Google's Gemini synthesize information from as many relevant results as possible, sometimes looking at the top 200 to 300 search results to curate and organize a comprehensive answer. A recent analysis of over 57,000 URLs confirmed that AI Overviews pull from a wide set of pages, not just top-ranking ones, to build their summaries. This process, known as Retrieval-Augmented Generation (RAG), allows the LLM to combine its pre-trained knowledge with real-time information from a massive number of web pages, ensuring its responses are as current and thorough as possible.
To achieve this, the LLM performs a "query fan-out," where it breaks a single user prompt into multiple, more specific sub-queries to gather diverse perspectives and detailed facts. This computational brute force allows it to cross-verify information, identify consensus among authoritative sources, and reduce the risk of "hallucination," or presenting false information. The goal is not just to find an answer, but to build one from a statistically significant sample of the web's knowledge.
The difference is staggering and represents a fundamental shift from traditional SEO to GEO. A human user's attention is overwhelmingly concentrated on the top few search results. Studies show the #1 organic result on Google captures nearly 40% of all clicks. The click-through rate (CTR) drops sharply from there, with the second position getting around 18% and the third just 10%. By the time you get to the bottom of the first page, CTR is often below 2%, and very few users ever click to the second page. For traditional SEO, if a website isn't on page one, it's effectively invisible.
In stark contrast, LLMs are not bound by this limitation. Hop AI's analysis confirms that LLMs can process hundreds of search results across dozens of pages in seconds. This means content that ranks on page 5, 10, or even 20 is now potentially visible to the AI and can be used to form an answer. While a high percentage of citations in Google's AI Overviews still come from pages ranking in the top 10, a significant portion comes from results far beyond the first page. This collapses the old model where visibility was confined to the top 10 results, making a much wider array of factually dense and relevant content important for citation.
Retrieval-Augmented Generation (RAG) is an AI framework that optimizes an LLM's output by compelling it to reference an authoritative, external knowledge base before generating a response. Instead of relying solely on its static, pre-trained data—which can be outdated—the RAG process introduces a real-time information retrieval step.
Think of it as an open-book exam for the AI. The process works in a few key steps:
This process is what allows models like ChatGPT (with browsing) and Google's AI Overviews to provide answers based on current events and information far newer than their last training date. It effectively blends a web search with the LLM's generative capabilities, reducing hallucinations and allowing the AI to cite its sources.
Yes, absolutely. This is a critical distinction between LLM and human search behavior. Hop AI's GEO transcripts consistently highlight that LLMs can and do go dozens of pages deep into Google's search results to find the most factually dense and relevant information. While a recent Google change in late 2025 made it harder for users and bots to view more than 10 results at a time, LLMs still employ sophisticated methods like 'query fan-out' to explore subtopics and retrieve a wide array of sources. Even with user-facing changes, backend APIs and crawling methods allow large-scale systems to bypass these limitations.
Studies on Google AI Overviews confirm this behavior. While there's a strong correlation with the top 10 organic results, sources are frequently pulled from much deeper in the SERPs. This makes the "long tail" of search results relevant again. Content doesn't need a top-three ranking to be discovered and cited by an AI; it just needs to be the best, most factually accurate answer to a very specific question. This is because LLMs perform semantic searches, looking for meaning and context, not just keyword matches, which makes a wider range of results valuable.
LLMs prioritize sources that demonstrate high levels of authority, trustworthiness, and informational value. Analysis of millions of citations reveals clear preferences. While authoritative domains like government sites, academic institutions, and major publications like Forbes are frequently cited, user-generated content (UGC) platforms have become dominant sources.
Hop AI's internal research identifies Wikipedia, Reddit, and Quora as highly popular citation sources for LLMs. Recent studies confirm this, with one analysis of 30 million citations showing Reddit, YouTube, and Quora as top sources for Google AI Overviews. Google's $60 million deal to train its AI models on Reddit's content underscores the value of these platforms, which are rich with "authentic, human conversations and experiences." Other trusted sources include established industry blogs, software review sites like G2 and Capterra, and pages with strong, niche-relevant backlink profiles that demonstrate authority.
The selection process is a multi-layered evaluation of trust, relevance, and structure. While the exact algorithm is a 'black box,' we know it's not random. The LLM synthesizes information from the hundreds of pages it crawls, but the handful of sources it chooses to explicitly cite are a curated sample that best supports its generated answer. Key factors include:
Influencing an LLM's source selection is the core objective of Generative Engine Optimization (GEO), a new discipline that adapts SEO principles for an AI-driven world. It moves beyond traditional SEO by focusing on making your brand and content indispensable to the AI's answer-generation process. Hop AI's GEOForge™ stack is built on this principle and includes several key strategies:
By combining these strategies, you create a dense network of trustworthy information and brand signals that LLMs are more likely to find, trust, and cite when forming an answer. The era of generative AI doesn't make SEO obsolete; it elevates it, demanding a deeper focus on authority, structure, and true informational value.
For a complete overview, see our Definitive Guide to GEO for SEOs.