Google Search is a fully-automated search engine that uses software known as web crawlers to explore the web regularly, finding pages to add to our index. In fact, most pages listed in our results aren’t manually submitted for inclusion but are found and added automatically as our web crawlers explore the internet. This document explains the stages of how search works in the context of your website. Having this foundational knowledge can help you address crawling issues, get your pages indexed, and learn how to optimize how your site appears in Google Search.
Looking for something less technical? Check out our How Search Works site, which explains how search engine works from a searcher’s perspective.
A Few Notes Before We Get Started
Before diving into the details, it’s essential to note that Google doesn’t accept payment to crawl a site more frequently or to rank it higher. If anyone claims otherwise, they’re mistaken.
Google doesn’t guarantee that it will crawl, index, or serve your page, even if your page follows the Google Search Essentials.
Introducing the Three Stages of Google Search
Google Search operates in three stages, and not all pages make it through each stage:
- Crawling: Google downloads text, images, and videos from pages it found on the internet with automated programs called crawlers.
- Indexing: Google analyzes the text, images, and video files on the page and stores the information in the Google index, which is a vast database.
- Serving Search Results: When a user searches on Google, the search engine returns information relevant to the user’s query.
Crawling
The first stage involves discovering what pages exist on the web. Since there’s no central registry of all web pages, Google must constantly look for new and updated pages to add to its list of known pages. This process is referred to as “URL discovery.” Some pages are known because Google has already visited them, while others are discovered when Google follows links from known pages—such as a hub page linking to a new blog post. Additionally, pages can be discovered when you submit a list of URLs (a sitemap) for Google to crawl.
Once Google discovers a page’s URL, it may visit (or “crawl”) the page to learn what’s on it. We use a massive network of computers to crawl billions of pages on the web. The program that performs this task is called Googlebot (also known as a crawler, robot, bot, or spider). Googlebot uses an algorithmic process to determine which sites to crawl, how often, and how many pages to fetch from each site. The crawlers are programmed to avoid overwhelming a site, adjusting based on server responses (for example, HTTP 500 errors indicate “slow down”).
However, Googlebot doesn’t crawl every page it discovers. Some pages may be blocked from crawling by the site owner, while others might require login credentials.
During the crawl, Google renders the page and executes any JavaScript it finds using a recent version of Chrome, similar to how a browser renders pages you visit. Rendering is crucial because many websites rely on JavaScript to display content, and without rendering, Google might miss that content.
Crawling depends on whether Google’s crawlers can access the site. Common issues include:
- Server handling problems
- Network issues
- robots.txt rules that prevent Googlebot’s access to the page
Indexing
After crawling a page, Google attempts to understand its content during the indexing stage. This process includes analyzing the textual content, key content tags, and attributes, such as <title> elements and alt attributes, images, videos, and more.
During indexing, Google identifies if a page is a duplicate of another page on the internet or a canonical page. The canonical page is the one that may be shown in search results. To determine the canonical, Google groups together (also known as clustering) pages with similar content and selects the most representative one. The other pages in the group are alternate versions that may be served in different contexts, such as when a user searches from a mobile device or looks for a specific page within that cluster.
Google also collects signals about the canonical page and its contents, which may be useful in the next stage—serving the page in search results. Signals include the page’s language, the country the content is targeted to, and the page’s usability.
The collected information about the canonical page and its cluster is stored in the Google index, a vast database hosted on thousands of computers. Indexing isn’t guaranteed; not every page processed by Google will be indexed.
Indexing also depends on the content and metadata of the page. Common indexing issues can include:
- Low-quality content
- Robots meta rules preventing indexing
- Website design making indexing difficult
Serving Search Results
Google does not accept payment to rank pages higher; ranking is done programmatically. When a user enters a query, our machines search the index for matching pages and return the results that we believe are the highest quality and most relevant to the user’s query. Relevancy is determined by hundreds of factors, which may include information such as the user’s location, language, and device type (desktop or mobile). For instance, searching for “bicycle repair shops” would yield different results for a user in Paris compared to a user in Hong Kong.
The search features appearing on the results page also change based on the user’s query. For example, searching for “bicycle repair shops” is likely to show local results without images, while searching for “modern bicycle” may show image results but not local ones. You can explore the most common UI elements of Google web search in our Visual Element gallery.
Search Console might inform you that a page is indexed, but if you don’t see it in search results, it may be due to:
- Irrelevant content for users’ queries
- Low content quality
- Robots meta rules preventing serving
While this guide explains how search engine works, we are continually improving our algorithms. Stay updated on these changes by following the Google Search Central blog.
For more on the details of how indexing algorithm works, check out our How Search Engine Work page.