Google Indexing Study: Insights from 16 Million Pages

Written by Joel Cariño|Last updated: 21 February 2025

Google’s indexing process has always been critical for a website’s online visibility. 

However, with recent developments, like the surge in AI-generated content and Google’s evolving efforts to maintain content quality, indexing has become more unpredictable than ever. 

From indexing glitches to the recent spike in de-indexing (remember March Madness?), many website owners wonder: is their content being seen?

At IndexCheckr, we monitor the indexing status of millions of pages, helping users track which URLs Google includes in its search results. 

For the first time, we’ve analyzed data from 16 million pages to uncover key statistics about how often pages and links are indexed, de-indexed, or left unseen. 

This analysis explores how Google adapts to the overwhelming influx of new content and how that affects you. Let’s take a look:


Chapter 1: The Current State of Indexing


From our analysis of 16 million pages, we uncovered the following breakdown of Google indexing:

Graph showing the distribution of page indexed, page not indexed, and domain not indexed
  • Domain Not Indexed: 143,068 pages (0.98%) fall into this category, meaning their entire domain is not indexed. This critical issue often indicates penalties or severe quality concerns that forced Google to exclude the entire domain from SERPs.
  • Page Not Indexed: 9,036,446 pages (61.94%) are in this status, where the domain is indexed, but the specific pages are not served in search results. This suggests that a large majority of pages fail to meet Google’s standards for inclusion.
  • Page Indexed: 5,409,096 pages (37.08%) have achieved full indexing, which means they appear in search results and are accessible to users.

This distribution highlights significant challenges for website owners. While over a third of pages are successfully indexed (Status 2), the vast majority are either not visible (Status 1) or entirely excluded due to domain-level issues (Status 0).

The data reveals a stark reality: most content published online struggles to meet Google’s increasingly strict standards, especially now. 

Looking back at 2024, Google rolled out several core updates that raised the bar for indexing and removed low-quality, unhelpful, and unoriginal content (a.k.a. AI-generated content) in SERPs. According to Google, their combined efforts in 2024 plus toward the end of 2023 cleaned 45% of content in search results.

As Google tightens its grip on low-value and AI-generated content, ensuring pages are indexed has become a critical hurdle for many site owners and SEOs.

In the next chapter, we’ll explore how indexing status evolves over time, uncovering key patterns that explain how pages move from obscurity to visibility—or fall back into the shadows.


Google’s indexing behavior has evolved significantly over the years. 

By analyzing the indexing rate of pages from 2021 to 2025, we can see how the proportion of indexed vs. non-indexed pages has changed.

Graph showing the rate of indexing per year from 2021-2025


Detailed Breakdown

Year Domain Not Indexed Page Not Indexed Page Indexed
2021 834 30,168 21,534
2022 36,828 825,514 356,269
2023 35,705 5,621,244 2,753,473
2024 66,193 2,464,277 2,138,912
2025 5,645 142,117 356,269


Caveat

The sample for 2021 may be too small to draw conclusions for this year.


Analysis

The year 2022 marked a paradigm shift toward artificial intelligence since the public release of ChatGPT, which triggered a massive wave of AI-generated content. 

It is highly likely that the surge in low-quality, mass-produced content caused a dramatic increase in unindexed pages and domains in 2022 and 2023. 

After the May 2022 core update, there were many speculations that Google targeted AI-generated content

Moreover, the search engine behemoth released two Helpful Content Updates (HCU) in December 2022 and September 2023, which heavily focused on writing helpful, reliable, and people-first content. In other words, it prioritizes providing “useful value” to readers through content, instead of regurgitating already-known information like generative AI does.

The nail in the coffin was probably the April 2023 Reviews Update, where Google leaned heavily on E-E-A-T (experience, expertise, authoritativeness, trustworthiness). This means content that cannot demonstrate actual real-life experience (guess what can’t? AI, duh) is given less priority, even possibly de-indexed, on Google search.

In 2024 and 2025, we detected a relative increase in indexed pages, which might be explained by a few things:

First, the increase in Google’s indexing rate might suggest that the search engine may be catching up after previously reported indexing struggles.

Secondly, during the September 2023 HCU, Google seemingly loosened its grip against AI-generated content. The previous statement saying the HCU was designed to “ensure people see original, helpful content written by people, for people, in search results” were replaced by a more inclusive acceptance in the rise of AI-generated content, provided they are not used to manipulate rankings and violate Google’s guidelines on spam.

Thirdly, SEOs and site owners are smart people. They are constantly adapting to Google’s ever-unpredictable indexing and ranking system, which means they got the hint and produced more valuable and experiential content during the 2024-2025 period.


Chapter 3: Indexing Time


This section examines how quickly Google adds newly published pages to its index:

Graph showing the indexing time


Caveat

The analyzed pages may have been submitted or visible to Google for longer than the recorded data. The time measured reflects only the period since they were tracked in IndexCheckr.


Key Findings

Pages take an average of 27.4 days to be indexed.

  • Fast Indexing (0–7 days): 14.00% were indexed within the first week.
  • One Month: 64.86% were indexed within the first 30 days.
  • Three Months 76.81% were indexed within the first three months.


Detailed Breakdown

Days to Index Pages Indexed Percentage of Indexed Pages Cumulative Pages Cumulative Percentage
0–7 43,518 14.00% 43,518 14.00%
8–30 157,997 50.86% 201,515 64.86%
31–90 37,127 11.95% 238,642 76.81%
91+ 25,797 8.31% 264,439 85.12%


Analysis

The data shows us that indexing speed varies across the board. Some pages get indexed within the first week, while others take more time. 

It’s not until after the 3rd month that 85.12% of pages get indexed. Furthermore, our analysis found that Google will have indexed 93.2% of pages within the first six months, while the remaining 6.8% get indexed after day 180. 

The variation in page indexing speed can be attributed to two factors, per Google’s crawl budget guide:

  1. Crawl demand: This measures a website’s size, popularity, update frequency, page quality, and relevance
  2. Crawl capacity limit: This measures a website’s response time to server requests and Google’s preferential prioritization

These two elements primarily influence the crawl budget or the time and resources allotted by Google to crawl a website. 

Crawl budget and indexing rate are positively correlated, meaning an increased crawl budget generally leads to faster and more frequent indexing. Of course, other factors still influence whether and when a page gets indexed.

Pages indexed within the first 7 days (14%) likely came from high-authority and well-established websites. This includes industry household names and news sites that regularly publish new content. 

The bulk of indexing happens at 8-30 days (50.86%). Most pages fall into this range because Google follows its standard crawling cycle.

Indexed pages toward the end of this 30-day range might experience some delays to allow Google to assess content quality and relevance, especially considering the rise of AI-generated content.

Slower indexing at 31-90 days (11.95%) might come from lower-priority pages, such as those with fewer ranking signals. This includes low-quality backlinks, deep site architecture, and low overall user engagement.

Often, these pages go through algorithmic hesitation where Google may hold back indexing if it detects potential duplicate, thin, or low-quality content.

Finally, delayed indexing at 91 days and beyond (14.88%) happens to pages from websites with low crawl priority, poor content quality, and duplicate or near-duplicate content. Even if these pages get published on Google, they are unlikely to perform well on SERPs.

This analysis highlights the importance of proactive optimization within the first three months to ensure faster indexing, especially for critical content.


Chapter 3: Deindexing Time


Deindexing is a critical process that directly impacts the visibility of a website’s pages on Google. 

We analyzed 310,705 unique pages and uncovered how quickly pages lose their indexed status after being tracked in IndexCheckr.

Graph showing the de-indexing time


Caveat

These pages may have been indexed for longer than recorded. The measured time reflects only the period from when they were tracked in IndexCheckr to when they were deindexed.


Key Findings

21.29% of pages get deindexed.

  • Rapid Deindexing (0–7 days): 1.97% were deindexed within the first week.
  • One Month: 7.97% were deindexed within 30 days.
  • Three Months: 13.70% were deindexed within the first three months.


Detailed Breakdown

Days to Index Pages Deindexed Percentage of Deindexed Pages Cumulative Pages Cumulative Percentage
0–7 6,120 1.97% 6,120 1.97%
8–30 18,638 6.00% 24,758 7.97%
31–90 17,804 5.73% 42,562 13.70%
91+ 23,577 7.59% 66,139 21.29%


Analysis

The results show that it can take time for a page to be removed from Google’s index. Most deindexing occurs within the first 90 days, accounting for 13.70% of the total dataset. After this period, deindexing slows, with the remaining 7.59% of pages losing their indexed status beyond Day 90.

In total, 21.29% of pages in the dataset were deindexed, while the remaining 78.71% remained in the Google index. 

This timeline highlights the importance of early monitoring and optimization to address potential issues that could lead to deindexing. 

Beyond three months, the risk of deindexing diminishes but persists, emphasizing the need for long-term tracking of indexing status, staying updated on Google’s algorithm updates, and making periodic audits essential for long-term content visibility.


Chapter 4: The Impact of Indexing Submission Tools


For pages that remain unindexed, submitting them to indexing tools can provide a potential solution to gain visibility on Google. 

To evaluate the effectiveness of this approach, we analyzed the success rate of indexing submissions for a dataset of 33,930 pages.

Graph showing the distribution of successfully indexed and not indexed pages after submission to indexing tools


Key Findings

  • Indexed after Submission: 9,965 pages (29.37%) successfully transitioned to an indexed state following submission.
  • Still Not Indexed: 23,965 pages (70.63%) remained unindexed after submission.


Detailed Breakdown

Submission Outcome Pages Percentage
Successfully Indexed 9,965 29.37%
Not Indexed 23,965 70.63%


Analysis

The report shows that Google indexing tools helped 9,965 (29.37%) of previously unindexed pages become indexed. However, a significant 23,965 (70.63%) pages remained unindexed even after submission.

So, why do some pages get indexed while others don’t? 

It boils down to two things:

  1. The process used by the indexer tools (minor reason)
  2. Google’s indexing policy (primary reason)

Many Google indexing tools keep their working mechanism under wraps, but many use either one of these two strategies, which we discussed in detail in our top indexing tools resource:

First, using Google’s indexing API. This involves the process of pinging Google and getting its immediate attention to crawl a page and hopefully index it. 

Obviously, this process is less effective as even if you get Google’s attention, the search engine will always yield to analyzing a page’s ranking signals first before it gets indexed.

Secondly, building temporary backlinks to the target page. Backlinks are among the most effective ranking signals since Google treats them as endorsements from external sources. Moreover, backlinks are also pathways for discovery. 

A high number of backlinks changes a page’s perceived value in Google’s eyes, making the search engine more impressionable to index the page. This often works, but the page might also get de-indexed once the backlinks are removed.

While indexing submission tools work, Google’s selective indexing policy will always take precedence.

Google will never index every page it crawls, especially if the content’s relevance and quality are subpar to Google’s standard. 

This explains why nearly 71% of submitted pages remained unindexed. Coinciding with the helpful content and core algorithm updates in the past years, Google is becoming increasingly cautious with indexing to maintain quality search results.

Ultimately, we can positively say that indexing submission tools can only expedite the crawling and indexing of index-worthy pages but never guarantee indexation.