The Technical SEO Audit That Matters: 20 Checks, Zero Noise
Most technical SEO checklists run 80+ items and bury the 20 that actually move rankings. This is the short list: 5 categories, 20 checks, clear pass/fail criteria, and honest notes on which popular recommendations are wasting your sprint cycles.
Why Do Most Technical SEO Audits Fail to Drive Action?
- 72% of flagged issues had no measurable ranking impact when fixed
- 8 of the top 10 ranking-impacting fixes were the same across every audit
- Under 15% of audit recommendations included pass/fail criteria an engineer could validate without asking follow-up questions
How Should You Structure a Technical SEO Audit?
- Crawling: Can search engines discover and traverse your pages?
- Indexation: Are the right pages in the index, and only the right pages?
- Performance: Do pages load fast enough to rank and convert?
- Structure: Can search engines parse your content and relationships?
- AI Readiness: Can LLMs and AI-powered search extract, attribute, and cite your content?
| # | Check | Category | Tool | Pass Criteria | Priority |
|---|---|---|---|---|---|
| 1 | Robots.txt validation | Crawling | Google Search Console | No critical paths blocked; file returns 200 | P0 |
| 2 | XML sitemap health | Crawling | Screaming Frog | All listed URLs return 200; <50,000 per file | P0 |
| 3 | Crawl depth distribution | Crawling | Screaming Frog | 90%+ of indexable pages within 3 clicks | P1 |
| 4 | Internal link coverage | Crawling | Screaming Frog / Sitebulb | 0 orphan pages; every indexable page has 3+ internal links | P1 |
| 5 | Canonical tag consistency | Indexation | Screaming Frog | Every indexable page self-canonicals; no loops or chains | P0 |
| 6 | Index coverage gaps | Indexation | Google Search Console | Indexed page count within 10% of intended indexable count | P0 |
| 7 | Redirect chain audit | Indexation | Screaming Frog | 0 chains >2 hops; all 301s resolve to 200 | P1 |
| 8 | Duplicate content signals | Indexation | Siteliner / Screaming Frog | No 2 indexable URLs with >85% content similarity | P1 |
| 9 | Largest Contentful Paint | Performance | PageSpeed Insights (CrUX) | LCP ≤ 2.5s at 75th percentile (field data) | P0 |
| 10 | Interaction to Next Paint | Performance | PageSpeed Insights (CrUX) | INP ≤ 200ms at 75th percentile | P0 |
| 11 | Cumulative Layout Shift | Performance | PageSpeed Insights (CrUX) | CLS ≤ 0.1 at 75th percentile | P1 |
| 12 | Mobile usability | Performance | Google Search Console | 0 pages with mobile usability errors | P1 |
| 13 | Schema markup validation | Structure | Rich Results Test | 0 errors on all deployed schema types | P0 |
| 14 | Heading hierarchy | Structure | Screaming Frog | 1 H1 per page; no skipped levels (H1→H3) | P1 |
| 15 | Hreflang implementation | Structure | Aleyda Solis’s Hreflang Tags Generator / Screaming Frog | Return tags on all alternates; x-default present | P1* |
| 16 | Breadcrumb + nav structure | Structure | Manual review + Schema validator | Breadcrumbs on all pages below root; BreadcrumbList schema present | P2 |
| 17 | LLM crawl access | AI Readiness | robots.txt + llms.txt review | Not blocking GPTBot, ClaudeBot, Bingbot; llms.txt present | P0 |
| 18 | Content extractability | AI Readiness | Jina Reader / Markdownify test | Core content renders in plain text; no JS-only sections | P0 |
| 19 | Entity and authorship markup | AI Readiness | Schema validator + manual review | Author schema on articles; Organization schema on homepage | P1 |
| 20 | Structured FAQ/How-to markup | AI Readiness | Rich Results Test | FAQ/HowTo schema on relevant pages; 0 validation errors | P1 |
What Are the 4 Crawling Checks That Actually Matter?
Check 1: Robots.txt Validation
Open your robots.txt file. Read it. This takes 90 seconds and catches problems that persist for months because nobody looks. What you are looking for:- No
Disallow: /on production (this still happens post-migration more than you would expect) - CSS and JS files are not blocked (Google needs them to render pages)
- Your sitemap URL is declared and resolves to a 200
- No wildcard rules accidentally blocking parameterized URLs you want indexed
Check 2: XML Sitemap Health
Your sitemap is a declaration of intent. It tells Google: these are the pages I consider important enough to index. When the sitemap includes 404s, redirects, or noindexed URLs, that declaration loses credibility. Pass criteria: every URL in your sitemap returns a 200 status code, is self-canonicalized, and is not blocked by robots.txt. Each sitemap file stays under the 50,000-URL / 50MB limit. Screaming Frog’s sitemap audit mode handles this in a single crawl.Check 3: Crawl Depth Distribution
Crawl depth measures how many clicks it takes to reach a page from the homepage. Pages at depth 4+ receive significantly less crawl frequency. For sites with over 10,000 pages, this is where rankings silently erode. Export your crawl data, filter to indexable pages, and check the distribution. Target: 90% or more of indexable pages reachable within 3 clicks. If important category or product pages are buried at depth 5 or 6, restructure your internal linking before spending time on content.Check 4: Internal Link Coverage
Orphan pages (pages with zero internal links pointing to them) are effectively invisible to crawlers that follow links. Even if a page is in your sitemap, Google treats sitemap-only discovery as a weaker signal than link-based discovery. Run a full crawl and export pages with fewer than 3 unique internal links. Any indexable page with 0 internal links is a P0 fix. Pages with 1-2 links deserve review, especially if they target competitive keywords.What Crawling “Checks” Are Noise?
Crawl budget optimization for sites under 50,000 pages. This is the most overhyped concept in technical SEO. Google has stated repeatedly that crawl budget is only a concern for very large sites. If your site has 500 pages, or even 5,000, Google will crawl it fully. Spending engineering time on crawl budget for a 2,000-page site is solving a problem that does not exist. Server log analysis for small/medium sites. Log analysis is valuable for sites with 100,000+ pages where crawl patterns reveal structural issues at scale. For a 3,000-page B2B site, your time is better spent on the 4 checks above.Which Indexation Issues Cause the Most Ranking Damage?
Check 5: Canonical Tag Consistency
Canonical tags are the single most misimplemented element in technical SEO. The failure modes are subtle and the consequences are severe.- Canonical loops: Page A canonicals to Page B, which canonicals back to Page A. Google ignores both and picks its own preferred URL.
- Canonical chains: A points to B, B points to C. Google may follow the chain, or may not. Depends on the day.
- Non-self-referencing canonicals on unique pages: Your blog post canonicals to your homepage because a developer left a template default in place. That blog post will not rank.
Check 6: Index Coverage Gaps
Go to Google Search Console > Pages. Compare the number of indexed pages to the number of pages you intend to have indexed. If the gap exceeds 10%, you have a problem worth investigating. Common causes of index gaps:- “Discovered, currently not indexed”: Google found the URL but decided not to index it. Usually a quality signal: thin content, near-duplicate, or low internal link equity.
- “Crawled, currently not indexed”: Google crawled the page and actively chose not to index it. This is a stronger negative signal than “discovered.”
- “Excluded by noindex tag”: Intentional or accidental. Verify every noindex is deliberate.
Check 7: Redirect Chain Audit
Every hop in a redirect chain dilutes PageRank and adds latency. A 3-hop chain (301 to 301 to 301 to 200) is not catastrophic, but it accumulates. After a site migration, chains of 4-5 hops are common because old redirects get layered on top of older redirects. The fix is straightforward: flatten every chain so the origin URL points directly to the final destination. On sites that have been through 2+ migrations, this single fix often cleans up hundreds of redirect rules.Check 8: Duplicate Content Signals
True duplicate content (two indexable URLs serving identical or near-identical content) forces Google to pick one version and suppress the other. You lose control of which URL ranks. This is most common on ecommerce sites with parameterized URLs, faceted navigation, and print-friendly page versions. Screaming Frog’s “Near Duplicates” report flags pages with over 85% content similarity. Each pair needs either a canonical tag, a noindex, or content differentiation.What Indexation “Checks” Are Noise?
Meta robots “nofollow” audits on internal links. Google has stated since 2009 that nofollow on internal links does not conserve crawl equity the way people think it does. The “PageRank sculpting” technique died over 15 years ago. If someone recommends adding nofollow to your login page links, they are working from outdated information.“The audits that move rankings are short and specific. Twenty checks with clear pass/fail criteria will outperform 200 checks with vague recommendations every time. Engineers fix what they can validate. Give them a test, not a thesis.”
Hardik Shah, Founder of ScaleGrowth.Digital
How Do Core Web Vitals Fit Into a Technical SEO Audit?
Check 9: Largest Contentful Paint (LCP)
LCP measures when the largest visible element finishes rendering. The 2.5-second threshold applies to the 75th percentile of real user data (CrUX), not lab data. Lab data from Lighthouse is useful for debugging but does not determine your ranking signal. Common LCP killers:- Hero images served without proper sizing, modern formats (WebP/AVIF), or preload hints
- Render-blocking CSS or JavaScript delaying the first paint
- Server response time (TTFB) above 800ms, often caused by uncached database queries
Check 10: Interaction to Next Paint (INP)
INP replaced First Input Delay as a Core Web Vital in March 2024. It measures the worst-case responsiveness of your page across all user interactions, not just the first one. The threshold is 200ms at the 75th percentile. INP failures typically come from heavy JavaScript execution on the main thread. Third-party scripts (analytics, chat widgets, ad tags) are the most common offenders. A single poorly-timed analytics call can push INP above 400ms.Check 11: Cumulative Layout Shift (CLS)
CLS measures visual stability. When elements shift after the page loads (an ad slot pushing content down, a font swap causing text reflow, an image loading without dimensions), that is a CLS event. The threshold is 0.1 at the 75th percentile. The fix for 80% of CLS issues: set explicit width and height attributes on all images and video elements, and reserve space for dynamic elements like ad slots before they load.Check 12: Mobile Usability
Google’s mobile-first indexing has been the default since 2023. If your mobile experience is broken, your desktop rankings suffer too. Check Google Search Console’s Mobile Usability report for:- Text too small to read (below 16px effective size)
- Clickable elements too close together (below 48px touch targets)
- Content wider than screen (horizontal scrolling)
- Viewport not configured
What Performance “Checks” Are Noise?
Lighthouse performance scores as a ranking metric. Lighthouse runs lab tests from a simulated environment. Your Lighthouse score is not what Google uses for ranking. CrUX field data is. A site can score 45 on Lighthouse and still pass all Core Web Vitals in the field, or score 95 in Lighthouse and fail every field metric because real users are on slow connections. Optimizing for the Lighthouse number instead of CrUX data is optimizing for the wrong target. Sub-second TTFB obsession. Yes, faster is better. But spending 3 engineering sprints moving TTFB from 650ms to 350ms while your LCP element lacks a preload hint is misallocated effort. Fix the high-impact items first.What Structural Issues Do Search Engines Still Struggle With?
Check 13: Schema Markup Validation
Structured data does not directly boost rankings (Google has said this explicitly). What it does is qualify your pages for rich results: review stars, FAQ dropdowns, product pricing, event dates, how-to steps. Rich results increase click-through rate by 20-40% depending on the SERP layout, according to a 2024 Milestone Research study. The validation check is binary: run your pages through Google’s Rich Results Test. Any errors mean the schema will be ignored. Warnings are worth reviewing but are not blockers. Focus on the schema types that match your content:- Organization on your homepage
- Article / BlogPosting on editorial content
- Product on product pages (with price, availability, reviews)
- LocalBusiness on location pages
- FAQPage on pages with Q&A format content
Check 14: Heading Hierarchy
One H1 per page. Headings in sequential order (H1, H2, H3, not H1, H3, H2). No skipped levels. This is basic document semantics that Google uses to understand content structure. The most common violation: multiple H1 tags caused by CMS templates that style the site name, breadcrumbs, or sidebar titles as H1s. Screaming Frog’s H1 report catches these in seconds.Check 15: Hreflang Implementation
This check only applies to multi-language or multi-region sites. If you operate in one language and one country, skip it. If you do serve multiple locales, hreflang errors cause serious problems: wrong language versions ranking in wrong countries, duplicate content flags across domains, and cannibalization between your .com and .co.uk. The validation rule: every hreflang tag must have a corresponding return tag on the target page. If page-en links to page-fr, then page-fr must link back to page-en. Missing return tags cause Google to ignore the entire hreflang set. An x-default tag should be present as the fallback.Check 16: Breadcrumb and Navigation Structure
Breadcrumbs serve two purposes: they help users orient themselves, and they give Google an explicit hierarchy signal. BreadcrumbList schema makes that signal machine-readable. This is a P2 because the impact is incremental rather than binary. A site without breadcrumbs can still rank well. But breadcrumbs improve internal linking, provide SERP breadcrumb display, and reduce pogo-sticking by giving users clear paths to related content.What Structural “Checks” Are Noise?
Missing alt text on decorative images. Alt text matters for images that carry meaning: product photos, infographics, charts. Decorative images (background patterns, spacer graphics, icons next to text that already conveys the meaning) should have empty alt attributes (alt=""), not descriptive ones. Auditing every image for alt text without distinguishing decorative from meaningful is busywork.
Exact-match title tag length optimization. The “keep titles under 60 characters” rule is a guideline, not a ranking factor. Google rewrites 61% of title tags anyway, according to a 2023 Zyppy study of 80,000 titles. Write clear, keyword-relevant titles. Do not spend engineering time trimming 3 characters to hit an arbitrary limit.
Why Does AI Readiness Belong in a Technical SEO Audit?
Check 17: LLM Crawl Access
Large language models send their own crawlers to index web content: GPTBot (OpenAI), ClaudeBot (Anthropic), Bingbot (used by Copilot and ChatGPT’s browsing). Many sites block these bots in robots.txt, sometimes intentionally, sometimes because a security plugin added blanket bot-blocking rules. Check your robots.txt for anyDisallow rules targeting these user agents. Then check whether you have an llms.txt file at your domain root. This emerging standard (similar in spirit to robots.txt) tells AI systems which content to prioritize and how to attribute it. It is not yet a ranking factor, but it is a signal of intent.
Check 18: Content Extractability
LLMs do not render JavaScript the way Googlebot does. If your core content loads via client-side JavaScript, it may be invisible to AI crawlers. This is the single most damaging AI readiness gap we see: brands with React or Angular SPAs where the main content requires JS execution to appear in the DOM. Test this with Jina Reader (reader.jina.ai/your-url) or by running a simple fetch and markdown conversion. If the plain-text version of your page is missing key content, that content does not exist for AI systems.
Check 19: Entity and Authorship Markup
AI systems use structured data to determine who is saying what. Organization schema on your homepage tells AI systems who you are. Author schema on articles tells them who wrote the content. Without these signals, your content is anonymous to the systems generating AI-powered answers. This matters for citation. When an LLM generates an answer and needs to attribute a source, it favors sources with clear, machine-readable entity information. Author schema withsameAs links to LinkedIn, Wikipedia, or other authority profiles strengthens this signal.
Check 20: Structured FAQ and How-to Markup
FAQ and HowTo schema serve a dual purpose in 2026. They qualify pages for Google’s rich results (the traditional benefit), and they make content extractable for AI systems that look for question-answer pairs and procedural steps. When an AI overview answers a question, it looks for content structured as a question and answer. Pages with FAQPage schema are pre-formatted for this extraction. The same applies to HowTo schema for procedural queries. Validate these with Google’s Rich Results Test. Any errors mean the schema is ignored entirely.What AI Readiness “Checks” Are Noise?
Optimizing for every AI platform individually. You do not need separate strategies for ChatGPT, Perplexity, Gemini, and Claude. The fundamentals are the same: make your content accessible, structured, and attributable. The specific ranking algorithms differ, but the technical requirements overlap by over 90%. Build for extractability once. It works everywhere.“We added AI readiness to our technical audits 10 months ago. In that time, not a single client has come to us already passing all 4 checks. It’s the blind spot in every audit we review. The sites that fix it now have a 12-18 month structural advantage.”
Hardik Shah, Founder of ScaleGrowth.Digital
What Is the Right Cadence for Running These Checks?
- Weekly (automated): Checks 1, 5, 6, 9, 10. These are the items that break silently. A bad deployment can introduce a noindex tag or a canonical error on a Tuesday, and nobody notices until the next monthly report. Automate these with Screaming Frog scheduled crawls, ContentKing, or custom monitoring scripts.
- Monthly (manual review): Checks 2, 3, 4, 7, 8, 11, 12, 17, 18. These require human judgment. Is this orphan page intentional? Is this redirect chain from a deliberate architecture decision? Monthly review with a 30-minute time box is sufficient.
- Quarterly (full audit): All 20 checks, including structural items (13-16, 19-20) that change less frequently. This is your comprehensive baseline reset.
- Trigger-based (immediately after): Any CMS update, migration, domain change, or major content deployment. Run the full P0 set (checks 1, 2, 5, 6, 9, 10, 13, 17, 18) within 24 hours.
How Do You Prioritize Fixes After the Audit?
- P0 items with broad page impact go first. A canonical issue affecting 1,200 pages outranks a CLS issue on 3 pages. Multiply severity by scope.
- Group fixes by deployment type. Server config changes (robots.txt, redirects, headers) ship separately from template changes (schema, heading structure, image attributes). Mixing them in one sprint creates messy rollbacks if something breaks.
- Attach a measurable outcome to every fix. “Fix canonical tags on product pages” becomes “Fix canonical tags on 1,247 product pages; expected recovery of 400-600 indexed pages within 4 weeks.” This gives the engineering team a validation metric, not just a task.
- Set a regression test for each fix. After the canonical fix ships, add a weekly automated check (via the technical SEO monitoring layer) to confirm it stays fixed. Regressions from subsequent deployments are the number one reason technical SEO gains reverse.
What Does a Complete Technical SEO Audit Workflow Look Like?
Phase 1: Data Collection (2 hours)
- Full Screaming Frog crawl with JavaScript rendering enabled
- Export Google Search Console index coverage, mobile usability, and Core Web Vitals reports
- Pull CrUX data via PageSpeed Insights API for top 50 pages by traffic
- Run Rich Results Test on 10 representative page types
- Fetch key pages through Jina Reader for content extractability
Phase 2: Analysis Against the 20 Checks (2-3 hours)
Work through each check sequentially. Document the finding (pass/fail/partial), the page count affected, and the specific URLs involved. Do not write recommendations yet. Just document the state.Phase 3: Prioritization and Ticket Creation (1-2 hours)
Apply the prioritization framework from the previous section. Create tickets with:- Issue description (what is wrong)
- Affected URLs or URL patterns
- Pass criteria (how to verify the fix)
- Expected outcome (what changes in GSC, CrUX, or ranking)
- Regression test definition (how to catch if it breaks again)
Phase 4: Monitoring Setup (1 hour)
Configure automated checks for every P0 fix. When the fix ships, the monitoring layer validates it stays fixed. This is the step most teams skip, and it is the step that determines whether your audit produces lasting results or temporary ones. The SEO audit tool we built automates phases 1 and 2, cutting total time from 6-8 hours to 2-3 hours. But the framework works with manual tools too. The value is in the checks and the process, not the tooling.Get a Technical SEO Audit That Drives Action
We run the 20-check audit on your site, deliver prioritized tickets your engineering team can ship, and set up monitoring so fixes stay fixed. No 47-page PDFs. No noise. Request Your Audit →