Crawl budget: definition and optimisation for Google

Updated on February 22, 2026
Quick definition
Crawl budget is the number of pages Google's crawler (Googlebot) is willing to explore and index on a website within a given timeframe. It is determined by two factors: the crawl rate limit (to avoid overloading the server) and crawl demand (based on content popularity and freshness). Crawl budget is a limited resource that must be managed strategically so that priority pages are properly explored.
How it works
Crawl budget rests on two complementary dimensions. The crawl rate limit defines how fast Googlebot can crawl without overloading the server — it is directly tied to server response times. Crawl demand reflects Google's interest in the site's URLs, based on popularity (backlinks, traffic) and content freshness.
Concrete example: an e-commerce site with 500,000 URLs, 40% of which are filter pages (size, colour, price), wastes its crawl budget on low-value pages. The result: important new product pages take weeks to be indexed.
By blocking filter URLs via robots.txt or the `noindex` tag and submitting a clean XML sitemap, the site directs Googlebot to priority pages and accelerates their indexing.
For small sites (fewer than 1,000 pages), crawl budget is generally not an issue. It becomes critical for large sites (e-commerce, portals, news sites).
Why it matters
A poorly managed crawl budget means important pages remain unindexed for weeks, reducing their organic traffic potential.
- Unnecessary filter pages → budget wasted on zero-value content
- Unresolved duplicate pages → diluted indexing signals
- Poor server performance → Google reducing the crawl rate limit
This is particularly problematic for sites with high content turnover (news, new products), where every missed indexing day means lost traffic.
How to improve or use it
- 1Clean up your URL architecture by removing or consolidating duplicate and parameterised pages.
- 2Block useless URLs (filters, pagination, internal search) via robots.txt.
- 3Submit an up-to-date XML sitemap in Google Search Console.
- 4Improve server performance to raise your crawl rate limit.
- 5Set up 301 redirects rather than leaving 404 errors.
- 6Use the canonical tag to consolidate signals on reference pages.
With Sublim
Sublim shows you which pages on your site actually receive organic traffic, helping you identify those that deserve crawl priority. By visualising indexed pages with no traffic, you easily spot low-value content that wastes your crawl budget — data collected without cookies, hosted in Europe.
Frequently asked questions
How can I see how Googlebot crawls my site?
Google Search Console provides a 'Crawl Stats' report that shows the number of pages crawled per day, the average page download time and the types of resources crawled. You can also analyse your server logs with tools such as Screaming Frog Log File Analyser to see exactly which URLs Googlebot visits and how often.
Is crawl budget important for a small site?
For a well-structured site of under 1,000 pages, crawl budget is generally not a concern. Googlebot crawls small sites entirely within a few days. Crawl budget issues mainly affect large sites (e-commerce, portals) with thousands to millions of potential URLs.
Do pages blocked in robots.txt consume crawl budget?
Yes, Googlebot reads the robots.txt file for each domain, but it may still visit a blocked URL to check whether the block is still active. However, it will not index the content. To fully save crawl budget on pages with no SEO value, combine robots.txt blocking with the removal of internal links pointing to those URLs.
Related terms
Indexing is the process by which a search engine such as Google analys…
An XML sitemap is a file in XML format that lists all the important UR…
SEO (Search Engine Optimization) is the set of practices aimed at impr…
The canonical tag is an HTML tag placed in the `<head>` section of a w…