Mumbai, India
March 20, 2026

The Technical SEO Audit That Matters: 20 Checks, Zero Noise

SEO

The Technical SEO Audit That Matters: 20 Checks, Zero Noise

Most technical SEO checklists run 80+ items and bury the 20 that actually move rankings. This is the short list: 5 categories, 20 checks, clear pass/fail criteria, and honest notes on which popular recommendations are wasting your sprint cycles.

Why Do Most Technical SEO Audits Fail to Drive Action?

The short answer: volume without priority. A typical crawl tool exports 200+ issue types. An eager SEO analyst pastes them into a spreadsheet, groups them by severity (according to the tool’s own scoring), and delivers a 47-page PDF. The engineering team reads the first 3 pages, fixes the easiest items, and the document dies in a shared drive. We’ve reviewed technical audits from 38 brands across financial services, SaaS, D2C, and healthcare over the past 18 months. The pattern is consistent:
  • 72% of flagged issues had no measurable ranking impact when fixed
  • 8 of the top 10 ranking-impacting fixes were the same across every audit
  • Under 15% of audit recommendations included pass/fail criteria an engineer could validate without asking follow-up questions
The problem is not that technical SEO is unimportant. It is foundational. The problem is that most audits treat every crawl warning as equal. A missing H1 on a 404 page gets the same red flag as a canonical loop that deindexes 3,000 product pages. Those are not the same problem. This post gives you 20 checks across 5 categories. Each check includes the tool to run it, the pass/fail threshold, and the priority level. We also call out which “standard” checks are noise so you can cut them from your workflow.

How Should You Structure a Technical SEO Audit?

Group checks by what they affect, not by what tool finds them. Crawl tools, indexation tools, performance tools, and structured data validators all overlap. If you organize by tool, you get duplicate issues counted differently. If you organize by function, you get a clear picture of where the site is broken. The 5 categories that cover every meaningful technical SEO surface:
  1. Crawling: Can search engines discover and traverse your pages?
  2. Indexation: Are the right pages in the index, and only the right pages?
  3. Performance: Do pages load fast enough to rank and convert?
  4. Structure: Can search engines parse your content and relationships?
  5. AI Readiness: Can LLMs and AI-powered search extract, attribute, and cite your content?
Four checks per category. Twenty total. Run them quarterly for established sites, monthly during migrations or major launches, and immediately after any CMS update or infrastructure change. Below is the full reference table. We break down each check in the sections that follow.
# Check Category Tool Pass Criteria Priority
1 Robots.txt validation Crawling Google Search Console No critical paths blocked; file returns 200 P0
2 XML sitemap health Crawling Screaming Frog All listed URLs return 200; <50,000 per file P0
3 Crawl depth distribution Crawling Screaming Frog 90%+ of indexable pages within 3 clicks P1
4 Internal link coverage Crawling Screaming Frog / Sitebulb 0 orphan pages; every indexable page has 3+ internal links P1
5 Canonical tag consistency Indexation Screaming Frog Every indexable page self-canonicals; no loops or chains P0
6 Index coverage gaps Indexation Google Search Console Indexed page count within 10% of intended indexable count P0
7 Redirect chain audit Indexation Screaming Frog 0 chains >2 hops; all 301s resolve to 200 P1
8 Duplicate content signals Indexation Siteliner / Screaming Frog No 2 indexable URLs with >85% content similarity P1
9 Largest Contentful Paint Performance PageSpeed Insights (CrUX) LCP ≤ 2.5s at 75th percentile (field data) P0
10 Interaction to Next Paint Performance PageSpeed Insights (CrUX) INP ≤ 200ms at 75th percentile P0
11 Cumulative Layout Shift Performance PageSpeed Insights (CrUX) CLS ≤ 0.1 at 75th percentile P1
12 Mobile usability Performance Google Search Console 0 pages with mobile usability errors P1
13 Schema markup validation Structure Rich Results Test 0 errors on all deployed schema types P0
14 Heading hierarchy Structure Screaming Frog 1 H1 per page; no skipped levels (H1→H3) P1
15 Hreflang implementation Structure Aleyda Solis’s Hreflang Tags Generator / Screaming Frog Return tags on all alternates; x-default present P1*
16 Breadcrumb + nav structure Structure Manual review + Schema validator Breadcrumbs on all pages below root; BreadcrumbList schema present P2
17 LLM crawl access AI Readiness robots.txt + llms.txt review Not blocking GPTBot, ClaudeBot, Bingbot; llms.txt present P0
18 Content extractability AI Readiness Jina Reader / Markdownify test Core content renders in plain text; no JS-only sections P0
19 Entity and authorship markup AI Readiness Schema validator + manual review Author schema on articles; Organization schema on homepage P1
20 Structured FAQ/How-to markup AI Readiness Rich Results Test FAQ/HowTo schema on relevant pages; 0 validation errors P1
Priority key: P0 = fix before anything else (ranking/indexation at stake). P1 = fix within 30 days (measurable impact). P2 = fix during next sprint (incremental gains). *P1 for hreflang only applies to multi-language or multi-region sites.

What Are the 4 Crawling Checks That Actually Matter?

If Google cannot discover your pages, nothing else in this audit matters. Crawling is the foundation layer. Get it wrong and your content, your schema, your Core Web Vitals scores are all invisible.

Check 1: Robots.txt Validation

Open your robots.txt file. Read it. This takes 90 seconds and catches problems that persist for months because nobody looks. What you are looking for:
  • No Disallow: / on production (this still happens post-migration more than you would expect)
  • CSS and JS files are not blocked (Google needs them to render pages)
  • Your sitemap URL is declared and resolves to a 200
  • No wildcard rules accidentally blocking parameterized URLs you want indexed
In a 2023 Screaming Frog study of 10,000 sites, 14.8% had at least one critical resource blocked by robots.txt. That is 1 in 7 sites silently telling Google not to render their pages correctly.

Check 2: XML Sitemap Health

Your sitemap is a declaration of intent. It tells Google: these are the pages I consider important enough to index. When the sitemap includes 404s, redirects, or noindexed URLs, that declaration loses credibility. Pass criteria: every URL in your sitemap returns a 200 status code, is self-canonicalized, and is not blocked by robots.txt. Each sitemap file stays under the 50,000-URL / 50MB limit. Screaming Frog’s sitemap audit mode handles this in a single crawl.

Check 3: Crawl Depth Distribution

Crawl depth measures how many clicks it takes to reach a page from the homepage. Pages at depth 4+ receive significantly less crawl frequency. For sites with over 10,000 pages, this is where rankings silently erode. Export your crawl data, filter to indexable pages, and check the distribution. Target: 90% or more of indexable pages reachable within 3 clicks. If important category or product pages are buried at depth 5 or 6, restructure your internal linking before spending time on content.

Check 4: Internal Link Coverage

Orphan pages (pages with zero internal links pointing to them) are effectively invisible to crawlers that follow links. Even if a page is in your sitemap, Google treats sitemap-only discovery as a weaker signal than link-based discovery. Run a full crawl and export pages with fewer than 3 unique internal links. Any indexable page with 0 internal links is a P0 fix. Pages with 1-2 links deserve review, especially if they target competitive keywords.

What Crawling “Checks” Are Noise?

Crawl budget optimization for sites under 50,000 pages. This is the most overhyped concept in technical SEO. Google has stated repeatedly that crawl budget is only a concern for very large sites. If your site has 500 pages, or even 5,000, Google will crawl it fully. Spending engineering time on crawl budget for a 2,000-page site is solving a problem that does not exist. Server log analysis for small/medium sites. Log analysis is valuable for sites with 100,000+ pages where crawl patterns reveal structural issues at scale. For a 3,000-page B2B site, your time is better spent on the 4 checks above.

Which Indexation Issues Cause the Most Ranking Damage?

Crawling gets your pages discovered. Indexation determines which ones Google keeps. The gap between “crawled” and “indexed” is where most technical SEO problems hide.

Check 5: Canonical Tag Consistency

Canonical tags are the single most misimplemented element in technical SEO. The failure modes are subtle and the consequences are severe.
  • Canonical loops: Page A canonicals to Page B, which canonicals back to Page A. Google ignores both and picks its own preferred URL.
  • Canonical chains: A points to B, B points to C. Google may follow the chain, or may not. Depends on the day.
  • Non-self-referencing canonicals on unique pages: Your blog post canonicals to your homepage because a developer left a template default in place. That blog post will not rank.
Run Screaming Frog with the “Canonicals” tab open. Filter for non-self-referencing canonicals and validate every single one. On a 5,000-page ecommerce site we audited in late 2025, 1,247 product pages had canonicals pointing to out-of-stock parent products. Those 1,247 pages had been effectively deindexed for 7 months. Fixing the canonicals recovered 34% of lost organic traffic within 6 weeks.

Check 6: Index Coverage Gaps

Go to Google Search Console > Pages. Compare the number of indexed pages to the number of pages you intend to have indexed. If the gap exceeds 10%, you have a problem worth investigating. Common causes of index gaps:
  1. “Discovered, currently not indexed”: Google found the URL but decided not to index it. Usually a quality signal: thin content, near-duplicate, or low internal link equity.
  2. “Crawled, currently not indexed”: Google crawled the page and actively chose not to index it. This is a stronger negative signal than “discovered.”
  3. “Excluded by noindex tag”: Intentional or accidental. Verify every noindex is deliberate.
A 10% gap on a 500-page site means 50 pages. Manageable. A 10% gap on a 50,000-page site means 5,000 pages not working for you. The same percentage, very different urgency.

Check 7: Redirect Chain Audit

Every hop in a redirect chain dilutes PageRank and adds latency. A 3-hop chain (301 to 301 to 301 to 200) is not catastrophic, but it accumulates. After a site migration, chains of 4-5 hops are common because old redirects get layered on top of older redirects. The fix is straightforward: flatten every chain so the origin URL points directly to the final destination. On sites that have been through 2+ migrations, this single fix often cleans up hundreds of redirect rules.

Check 8: Duplicate Content Signals

True duplicate content (two indexable URLs serving identical or near-identical content) forces Google to pick one version and suppress the other. You lose control of which URL ranks. This is most common on ecommerce sites with parameterized URLs, faceted navigation, and print-friendly page versions. Screaming Frog’s “Near Duplicates” report flags pages with over 85% content similarity. Each pair needs either a canonical tag, a noindex, or content differentiation.

What Indexation “Checks” Are Noise?

Meta robots “nofollow” audits on internal links. Google has stated since 2009 that nofollow on internal links does not conserve crawl equity the way people think it does. The “PageRank sculpting” technique died over 15 years ago. If someone recommends adding nofollow to your login page links, they are working from outdated information.

“The audits that move rankings are short and specific. Twenty checks with clear pass/fail criteria will outperform 200 checks with vague recommendations every time. Engineers fix what they can validate. Give them a test, not a thesis.”

Hardik Shah, Founder of ScaleGrowth.Digital

How Do Core Web Vitals Fit Into a Technical SEO Audit?

Performance is a ranking signal, but context matters. Google’s own documentation describes Core Web Vitals as a tiebreaker, not a primary ranking factor. A page with excellent content and poor LCP will still outrank a page with poor content and perfect LCP. But when two pages are otherwise comparable, performance tips the scale. The more practical reason to care: conversion rate. A 2024 Portent study found that pages loading in 1 second convert at 3x the rate of pages loading in 5 seconds. You are not just optimizing for Google. You are optimizing for the person who bounces at the 3-second mark.

Check 9: Largest Contentful Paint (LCP)

LCP measures when the largest visible element finishes rendering. The 2.5-second threshold applies to the 75th percentile of real user data (CrUX), not lab data. Lab data from Lighthouse is useful for debugging but does not determine your ranking signal. Common LCP killers:
  • Hero images served without proper sizing, modern formats (WebP/AVIF), or preload hints
  • Render-blocking CSS or JavaScript delaying the first paint
  • Server response time (TTFB) above 800ms, often caused by uncached database queries

Check 10: Interaction to Next Paint (INP)

INP replaced First Input Delay as a Core Web Vital in March 2024. It measures the worst-case responsiveness of your page across all user interactions, not just the first one. The threshold is 200ms at the 75th percentile. INP failures typically come from heavy JavaScript execution on the main thread. Third-party scripts (analytics, chat widgets, ad tags) are the most common offenders. A single poorly-timed analytics call can push INP above 400ms.

Check 11: Cumulative Layout Shift (CLS)

CLS measures visual stability. When elements shift after the page loads (an ad slot pushing content down, a font swap causing text reflow, an image loading without dimensions), that is a CLS event. The threshold is 0.1 at the 75th percentile. The fix for 80% of CLS issues: set explicit width and height attributes on all images and video elements, and reserve space for dynamic elements like ad slots before they load.

Check 12: Mobile Usability

Google’s mobile-first indexing has been the default since 2023. If your mobile experience is broken, your desktop rankings suffer too. Check Google Search Console’s Mobile Usability report for:
  • Text too small to read (below 16px effective size)
  • Clickable elements too close together (below 48px touch targets)
  • Content wider than screen (horizontal scrolling)
  • Viewport not configured

What Performance “Checks” Are Noise?

Lighthouse performance scores as a ranking metric. Lighthouse runs lab tests from a simulated environment. Your Lighthouse score is not what Google uses for ranking. CrUX field data is. A site can score 45 on Lighthouse and still pass all Core Web Vitals in the field, or score 95 in Lighthouse and fail every field metric because real users are on slow connections. Optimizing for the Lighthouse number instead of CrUX data is optimizing for the wrong target. Sub-second TTFB obsession. Yes, faster is better. But spending 3 engineering sprints moving TTFB from 650ms to 350ms while your LCP element lacks a preload hint is misallocated effort. Fix the high-impact items first.

What Structural Issues Do Search Engines Still Struggle With?

Structure tells search engines what your content is, how it relates to other content, and where it sits in your site’s hierarchy. Poor structure does not always cause dramatic ranking drops. Instead, it creates a persistent ceiling: your pages rank, but never as well as they should.

Check 13: Schema Markup Validation

Structured data does not directly boost rankings (Google has said this explicitly). What it does is qualify your pages for rich results: review stars, FAQ dropdowns, product pricing, event dates, how-to steps. Rich results increase click-through rate by 20-40% depending on the SERP layout, according to a 2024 Milestone Research study. The validation check is binary: run your pages through Google’s Rich Results Test. Any errors mean the schema will be ignored. Warnings are worth reviewing but are not blockers. Focus on the schema types that match your content:
  • Organization on your homepage
  • Article / BlogPosting on editorial content
  • Product on product pages (with price, availability, reviews)
  • LocalBusiness on location pages
  • FAQPage on pages with Q&A format content

Check 14: Heading Hierarchy

One H1 per page. Headings in sequential order (H1, H2, H3, not H1, H3, H2). No skipped levels. This is basic document semantics that Google uses to understand content structure. The most common violation: multiple H1 tags caused by CMS templates that style the site name, breadcrumbs, or sidebar titles as H1s. Screaming Frog’s H1 report catches these in seconds.

Check 15: Hreflang Implementation

This check only applies to multi-language or multi-region sites. If you operate in one language and one country, skip it. If you do serve multiple locales, hreflang errors cause serious problems: wrong language versions ranking in wrong countries, duplicate content flags across domains, and cannibalization between your .com and .co.uk. The validation rule: every hreflang tag must have a corresponding return tag on the target page. If page-en links to page-fr, then page-fr must link back to page-en. Missing return tags cause Google to ignore the entire hreflang set. An x-default tag should be present as the fallback.

Check 16: Breadcrumb and Navigation Structure

Breadcrumbs serve two purposes: they help users orient themselves, and they give Google an explicit hierarchy signal. BreadcrumbList schema makes that signal machine-readable. This is a P2 because the impact is incremental rather than binary. A site without breadcrumbs can still rank well. But breadcrumbs improve internal linking, provide SERP breadcrumb display, and reduce pogo-sticking by giving users clear paths to related content.

What Structural “Checks” Are Noise?

Missing alt text on decorative images. Alt text matters for images that carry meaning: product photos, infographics, charts. Decorative images (background patterns, spacer graphics, icons next to text that already conveys the meaning) should have empty alt attributes (alt=""), not descriptive ones. Auditing every image for alt text without distinguishing decorative from meaningful is busywork. Exact-match title tag length optimization. The “keep titles under 60 characters” rule is a guideline, not a ranking factor. Google rewrites 61% of title tags anyway, according to a 2023 Zyppy study of 80,000 titles. Write clear, keyword-relevant titles. Do not spend engineering time trimming 3 characters to hit an arbitrary limit.

Why Does AI Readiness Belong in a Technical SEO Audit?

Because search is no longer just Google’s blue links. As of early 2026, Google AI Overviews appear on over 30% of commercial queries. ChatGPT, Perplexity, and Claude all pull from web content to generate answers. If your site is technically invisible to these systems, you are losing a growing share of discovery traffic. At ScaleGrowth.Digital, a growth engineering firm, we added AI readiness checks to our SEO audit framework in mid-2025. Since then, it has become the category where we find the most overlooked issues. Traditional SEO tooling does not flag these problems because the tools were built before LLM-powered search existed.

Check 17: LLM Crawl Access

Large language models send their own crawlers to index web content: GPTBot (OpenAI), ClaudeBot (Anthropic), Bingbot (used by Copilot and ChatGPT’s browsing). Many sites block these bots in robots.txt, sometimes intentionally, sometimes because a security plugin added blanket bot-blocking rules. Check your robots.txt for any Disallow rules targeting these user agents. Then check whether you have an llms.txt file at your domain root. This emerging standard (similar in spirit to robots.txt) tells AI systems which content to prioritize and how to attribute it. It is not yet a ranking factor, but it is a signal of intent.

Check 18: Content Extractability

LLMs do not render JavaScript the way Googlebot does. If your core content loads via client-side JavaScript, it may be invisible to AI crawlers. This is the single most damaging AI readiness gap we see: brands with React or Angular SPAs where the main content requires JS execution to appear in the DOM. Test this with Jina Reader (reader.jina.ai/your-url) or by running a simple fetch and markdown conversion. If the plain-text version of your page is missing key content, that content does not exist for AI systems.

Check 19: Entity and Authorship Markup

AI systems use structured data to determine who is saying what. Organization schema on your homepage tells AI systems who you are. Author schema on articles tells them who wrote the content. Without these signals, your content is anonymous to the systems generating AI-powered answers. This matters for citation. When an LLM generates an answer and needs to attribute a source, it favors sources with clear, machine-readable entity information. Author schema with sameAs links to LinkedIn, Wikipedia, or other authority profiles strengthens this signal.

Check 20: Structured FAQ and How-to Markup

FAQ and HowTo schema serve a dual purpose in 2026. They qualify pages for Google’s rich results (the traditional benefit), and they make content extractable for AI systems that look for question-answer pairs and procedural steps. When an AI overview answers a question, it looks for content structured as a question and answer. Pages with FAQPage schema are pre-formatted for this extraction. The same applies to HowTo schema for procedural queries. Validate these with Google’s Rich Results Test. Any errors mean the schema is ignored entirely.

What AI Readiness “Checks” Are Noise?

Optimizing for every AI platform individually. You do not need separate strategies for ChatGPT, Perplexity, Gemini, and Claude. The fundamentals are the same: make your content accessible, structured, and attributable. The specific ranking algorithms differ, but the technical requirements overlap by over 90%. Build for extractability once. It works everywhere.

“We added AI readiness to our technical audits 10 months ago. In that time, not a single client has come to us already passing all 4 checks. It’s the blind spot in every audit we review. The sites that fix it now have a 12-18 month structural advantage.”

Hardik Shah, Founder of ScaleGrowth.Digital

What Is the Right Cadence for Running These Checks?

Not every check needs the same frequency. Running all 20 checks weekly is overhead that produces diminishing returns. Running them annually is not enough to catch regressions from CMS updates, plugin changes, and content deployments. The cadence that balances thoroughness with practicality:
  • Weekly (automated): Checks 1, 5, 6, 9, 10. These are the items that break silently. A bad deployment can introduce a noindex tag or a canonical error on a Tuesday, and nobody notices until the next monthly report. Automate these with Screaming Frog scheduled crawls, ContentKing, or custom monitoring scripts.
  • Monthly (manual review): Checks 2, 3, 4, 7, 8, 11, 12, 17, 18. These require human judgment. Is this orphan page intentional? Is this redirect chain from a deliberate architecture decision? Monthly review with a 30-minute time box is sufficient.
  • Quarterly (full audit): All 20 checks, including structural items (13-16, 19-20) that change less frequently. This is your comprehensive baseline reset.
  • Trigger-based (immediately after): Any CMS update, migration, domain change, or major content deployment. Run the full P0 set (checks 1, 2, 5, 6, 9, 10, 13, 17, 18) within 24 hours.
Total time investment: roughly 2 hours per week for automated monitoring review, 30 minutes per month for manual checks, and 4-6 hours per quarter for the full audit. For a site generating $100,000+ in monthly organic traffic, that is a reasonable insurance premium.

How Do You Prioritize Fixes After the Audit?

The audit produces findings. Findings are not a roadmap. The step between “here are 11 issues” and “here is what we fix first” is where most technical SEO work stalls. Engineering teams work in sprints. They need tickets, not spreadsheets. Use this prioritization framework:
  1. P0 items with broad page impact go first. A canonical issue affecting 1,200 pages outranks a CLS issue on 3 pages. Multiply severity by scope.
  2. Group fixes by deployment type. Server config changes (robots.txt, redirects, headers) ship separately from template changes (schema, heading structure, image attributes). Mixing them in one sprint creates messy rollbacks if something breaks.
  3. Attach a measurable outcome to every fix. “Fix canonical tags on product pages” becomes “Fix canonical tags on 1,247 product pages; expected recovery of 400-600 indexed pages within 4 weeks.” This gives the engineering team a validation metric, not just a task.
  4. Set a regression test for each fix. After the canonical fix ships, add a weekly automated check (via the technical SEO monitoring layer) to confirm it stays fixed. Regressions from subsequent deployments are the number one reason technical SEO gains reverse.
The output of this prioritization is a backlog of 10-15 tickets. Not 47 pages. Not 200 rows. A focused list that an engineering lead can schedule into the next 2-3 sprints without needing a follow-up meeting to interpret the audit.

What Does a Complete Technical SEO Audit Workflow Look Like?

Here is the full workflow we run at ScaleGrowth.Digital, compressed into a repeatable process that takes 6-8 hours for a site under 50,000 pages.

Phase 1: Data Collection (2 hours)

  1. Full Screaming Frog crawl with JavaScript rendering enabled
  2. Export Google Search Console index coverage, mobile usability, and Core Web Vitals reports
  3. Pull CrUX data via PageSpeed Insights API for top 50 pages by traffic
  4. Run Rich Results Test on 10 representative page types
  5. Fetch key pages through Jina Reader for content extractability

Phase 2: Analysis Against the 20 Checks (2-3 hours)

Work through each check sequentially. Document the finding (pass/fail/partial), the page count affected, and the specific URLs involved. Do not write recommendations yet. Just document the state.

Phase 3: Prioritization and Ticket Creation (1-2 hours)

Apply the prioritization framework from the previous section. Create tickets with:
  • Issue description (what is wrong)
  • Affected URLs or URL patterns
  • Pass criteria (how to verify the fix)
  • Expected outcome (what changes in GSC, CrUX, or ranking)
  • Regression test definition (how to catch if it breaks again)

Phase 4: Monitoring Setup (1 hour)

Configure automated checks for every P0 fix. When the fix ships, the monitoring layer validates it stays fixed. This is the step most teams skip, and it is the step that determines whether your audit produces lasting results or temporary ones. The SEO audit tool we built automates phases 1 and 2, cutting total time from 6-8 hours to 2-3 hours. But the framework works with manual tools too. The value is in the checks and the process, not the tooling.

Get a Technical SEO Audit That Drives Action

We run the 20-check audit on your site, deliver prioritized tickets your engineering team can ship, and set up monitoring so fixes stay fixed. No 47-page PDFs. No noise. Request Your Audit

Free Growth Audit
Call Now Get Free Audit →