The complete walkthrough for creating, validating, and submitting XML sitemaps to Google. Covers standard sitemaps, image sitemaps, video sitemaps, and sitemap indexes for sites with 50,000+ URLs.
Last updated: March 2026 · 10 min read
Log into Google Search Console, click “Sitemaps” in the left sidebar under Indexing, paste your sitemap URL (e.g., /sitemap.xml), and click Submit.
This guide covers everything from basic submission to advanced sitemap strategies for large sites. Whether you have a 20-page WordPress site or a 200,000-page e-commerce store, the principles below apply.“A sitemap isn’t a set-it-and-forget-it file. It’s a live document that tells Google what you consider important on your site. Every URL in your sitemap is a crawl request. If you’re sending Google to 404 pages, redirect chains, or noindexed content, you’re wasting crawl budget and signaling that your site isn’t well-maintained.”
Hardik Shah, Founder of ScaleGrowth.Digital
Google discovers pages in two ways: by following links from pages it already knows about, and by reading sitemaps you submit directly. For well-linked sites with strong internal architecture, Google can find most pages through crawling alone. But sitemaps become critical in four scenarios:Definition: A sitemap is an XML file that lists the URLs on your website you want search engines to crawl and index. It includes metadata like the last modification date, change frequency, and priority for each URL.
| Format | File Type | Best For | URL Limit |
|---|---|---|---|
| XML Sitemap | .xml | Most websites (standard choice) | 50,000 URLs or 50MB |
| RSS/Atom Feed | .rss / .atom | Blogs, news sites with frequent updates | No fixed limit |
| Text Sitemap | .txt | Simple sites, quick setup | 50,000 URLs or 50MB |
yoursite.com/sitemap_index.xml or yoursite.com/sitemap.xml.
For custom-built sites or when you need manual control, here’s the standard XML sitemap structure:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page-1/</loc>
<lastmod>2026-03-15</lastmod>
</url>
<url>
<loc>https://example.com/page-2/</loc>
<lastmod>2026-02-20</lastmod>
</url>
</urlset>
Rules for what to include:
https://example.com/page/ with a trailing slash, the sitemap URL must have the trailing slash too)lastmod tag is optional but valuable. Google uses it to prioritize recrawling pages that have genuinely changed. Only update lastmod when the page content actually changes. Fake lastmod dates (updating daily when nothing changed) erode trust and can cause Google to ignore the tag entirely.
http and https versions, submit the sitemap to the https property.sitemap.xml or sitemap_index.xml). The base URL is pre-filled. Click Submit.| Status | What It Means | Action Required |
|---|---|---|
| Success | Sitemap was fetched and parsed without errors | None. Monitor the Pages report for indexing progress. |
| Has errors | Sitemap was fetched but contains format issues | Click the sitemap to see specific errors. Fix and resubmit. |
| Couldn’t fetch | Google couldn’t access the sitemap URL | Check the URL is correct, the file exists, and it’s not blocked by robots.txt. |
| Pending | Sitemap was submitted but not yet processed | Wait 24-48 hours. If still pending, resubmit. |
Sitemap: directive to your robots.txt file. Google reads robots.txt before crawling your site and will find the sitemap reference automatically.
# robots.txt
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
This method works without Search Console access. It’s useful when you’re managing a site you don’t own or when you want redundancy alongside a Search Console submission. The Sitemap: line can appear anywhere in the robots.txt file.
Method 3: Search Console API. For enterprise sites or automated workflows, use the Search Console API to submit sitemaps programmatically. The API endpoint is PUT https://www.googleapis.com/webmasters/v3/sites/{siteUrl}/sitemaps/{feedpath}. This is useful for sites that regenerate sitemaps frequently (e-commerce with daily inventory changes, news publishers).
We recommend using both Method 1 (Search Console) and Method 2 (robots.txt) together. There’s no downside to redundancy, and it ensures Google finds your sitemap even if one discovery method fails.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-03-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-03-10</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-categories.xml</loc>
<lastmod>2026-02-28</lastmod>
</sitemap>
</sitemapindex>
Segment your sitemaps logically. Don’t just split alphabetically. Group URLs by content type: products, blog posts, categories, location pages. This lets Google process each segment independently and makes debugging easier when indexing issues arise.
For e-commerce sites with frequent inventory changes, consider updating product sitemaps daily while keeping static sitemaps (about pages, policies) on a monthly cadence. The lastmod date on each sitemap entry in the index file tells Google which segments have changed and need recrawling.
Gzip compression is supported and recommended for large sitemaps. A 50MB uncompressed sitemap compresses to roughly 2-5MB, reducing server bandwidth and speeding up Google’s fetch time.
Download your sitemap, run every URL through a status code checker, and verify each URL returns 200. We’ve found sitemaps containing 10-30% non-200 URLs on roughly half the audits we run. That’s wasted crawl budget on every Googlebot visit.
Every URL in your sitemap should match its own canonical tag exactly. If a page’s canonical points to a different URL, remove the non-canonical version from the sitemap. Mixed signals confuse Google’s indexing decisions.
If your sitemap has 5,000 URLs but Google only indexes 3,200, that 36% gap demands investigation. Use the Pages report to find reasons: crawled but not indexed, duplicate content, soft 404s. Each reason has a different fix.
Only update the lastmod date when page content genuinely changes. Some CMS themes update lastmod on every server restart or plugin update. Google has stated it will ignore lastmod if it detects the dates aren’t reliable. Honest lastmod data accelerates re-crawling of truly updated content.
Disallow: /sitemap or an overly broad Disallow: /*.xml can block Google from reading your sitemap. Always test your robots.txt rules against your sitemap URL.
Mistake 4: Submitting one massive sitemap instead of a sitemap index. A single sitemap with 45,000 URLs is technically valid (under the 50,000 limit), but it’s harder to debug, slower to process, and gives you no segmentation. Break it into logical groups of 5,000-10,000 URLs each, referenced from a sitemap index. This mirrors how Google processes them internally.
Mistake 5: Ignoring sitemap errors in Search Console. Google reports sitemap errors explicitly. If your Sitemaps report shows “Has errors” or “Couldn’t fetch,” that needs immediate attention. Yet we’ve seen sites go months with broken sitemaps because nobody checks the report. Add a monthly calendar reminder or set up Search Console email notifications.
Once your pages are indexed, make sure they’re optimized. Our 47-point checklist covers every on-page factor that moves rankings. Get Checklist →
Before building pages worth indexing, you need the right keywords. Our Keyword Planner guide covers discovery, filtering, and export workflows. Read Guide →
See how we diagnose technical SEO issues (including sitemap problems) in a full audit report with prioritized recommendations. View Sample →
We audit sitemaps, crawlability, indexing, Core Web Vitals, and 30+ technical dimensions. Free diagnostic for qualified brands. Get Your Free Technical Audit →
Google fetches the sitemap file within minutes of submission. However, crawling and indexing the URLs inside it can take anywhere from a few hours to several weeks, depending on site size, crawl budget, and content quality. New sites with no crawl history take the longest.
No. If your CMS automatically updates the sitemap when new pages are published, Google will discover the changes on its next crawl of the sitemap file. Google recrawls submitted sitemaps periodically. You only need to resubmit if you change the sitemap URL itself or create an entirely new sitemap file.
Yes. You can submit multiple sitemap files to the same Search Console property. This is common for large sites that segment sitemaps by content type (products, blog, categories). You can also submit a single sitemap index file that references all individual sitemaps.
No. A sitemap is a request, not a directive. Google decides whether to index each URL based on content quality, technical health, and site authority. Google’s own documentation states that “it is possible that not all URLs in a sitemap will be crawled” and that crawling depends on site size, activity, and traffic.
A single sitemap file can contain up to 50,000 URLs and must not exceed 50MB uncompressed. If your site has more than 50,000 indexable URLs, use a sitemap index file that references multiple individual sitemaps. There’s no limit to how many sitemaps a sitemap index can reference.