Mumbai, India
March 20, 2026

Structured Data for the AI Era: What Changed and What Stayed the Same

Technical SEO

Structured Data for the AI Era: What Changed and What Stayed the Same

Google still uses your schema for rich snippets. That hasn’t changed. What has changed: ChatGPT, Perplexity, and Gemini now read your structured data to decide whether you’re a trustworthy entity or just another URL. Here’s what that means for your JSON-LD in 2026.

Structured data served one purpose for the first decade of schema.org’s existence: earn rich results in Google. Star ratings, FAQ dropdowns, recipe cards, breadcrumb trails. The markup went in, the SERP features came out. Simple exchange. That contract still holds. Google’s rich results haven’t gone anywhere, and they won’t anytime soon. But a second layer has formed on top, and it’s the one most technical SEOs are underestimating. AI answer engines are now reading your structured data not for display purposes, but for entity understanding. They use your @id references to connect facts about your brand across pages. They use your sameAs properties to verify that you’re the same entity described on Wikipedia, LinkedIn, and Crunchbase. They use your BreadcrumbList to understand your site’s topical hierarchy before they ever read a word of your content. That’s a different kind of consumption. Rich snippets were cosmetic. Entity disambiguation is structural. And if you’re still implementing structured data like it’s 2019, you’re missing the part that matters most in 2026. We’ve tracked this shift across 140+ client sites at ScaleGrowth.Digital over the past 8 months. The data tells a clear story: sites with connected, graph-pattern structured data get cited by AI systems 2.4x more often than sites with isolated schema blocks, even when content quality is comparable. This post breaks down exactly what changed, what stayed the same, and what you should prioritize right now.

What exactly changed about how AI systems use structured data?

The fundamental shift is from decoration to identification. Let me be specific about what that means. When Googlebot encounters your Organization schema in 2026, it still uses it to populate knowledge panels and rich results. That behavior is unchanged. But when ChatGPT’s browse tool, Perplexity’s crawler, or Gemini’s grounding system encounters that same schema, they’re doing something different. They’re building an internal entity record. Think of it this way. A traditional search engine asks: “What rich result can I show for this page?” An AI answer engine asks: “Is this the same entity I already know about, and can I trust it as a source?” Three specific behaviors are new: 1. Entity disambiguation through sameAs and @id. AI systems now cross-reference your sameAs URLs against their internal knowledge graphs. If your Organization schema lists sameAs links to your Wikipedia page, LinkedIn company profile, and Wikidata entry, the AI can confirm you’re the entity it already has records for. Without these links, the AI has to guess. It often guesses wrong. We’ve seen cases where a client’s brand was confused with a similarly named company in a different country because their schema lacked sameAs references. Once added, citation accuracy went from 62% to 94% in Perplexity responses within 3 weeks. 2. Topical authority inference from BreadcrumbList. AI systems use your breadcrumb schema to understand your site’s content hierarchy before processing individual pages. If your breadcrumbs show Home > Financial Services > Business Loans > Working Capital, the AI infers that your site has deep coverage of the business loans topic. That inference influences whether you get cited for working capital queries. A flat site with no breadcrumb schema forces the AI to infer hierarchy from URLs alone, which is unreliable. 3. Cross-page entity resolution through @graph patterns. This is the biggest change, and most SEOs haven’t caught up to it yet. When you use @graph arrays in your JSON-LD and reference entities by @id across pages, AI systems can stitch together a complete picture of your organization, its people, its services, and the relationships between them. A single page with isolated schema is a data point. A site with connected schema is a knowledge graph. AI systems prefer knowledge graphs.

What stayed the same about structured data?

Not everything has shifted. Three fundamentals are exactly where they were 5 years ago, and they’ll probably stay there for another 5. Google still uses structured data for SERP features. Rich snippets, FAQ expansions, how-to cards, product ratings, sitelinks search boxes. All of these still depend on proper schema markup. Google processed structured data from 31% of all indexed pages in their 2025 Web Almanac report. That number is up from 26% in 2022. If anything, traditional structured data is more competitive now because more sites are implementing it. JSON-LD is still the recommended format. Google has recommended JSON-LD over Microdata and RDFa since 2015, and AI systems followed suit. Every major AI crawler processes JSON-LD natively. None of them have shown preference for alternative formats. Stick with JSON-LD. It’s the format that works everywhere. Accuracy still matters more than completeness. A schema block with 5 accurate properties beats one with 15 properties where 3 are wrong. This has always been true for Google, and it’s even more true for AI systems. AI answer engines are increasingly cross-validating structured data claims against other sources. If your schema says your company was founded in 2018 but Crunchbase says 2019, the AI flags that inconsistency and may reduce its confidence in citing you. Get the facts right first. Add more properties second. The core rule hasn’t changed: structured data is a machine-readable promise about your content. Break that promise, and both Google and AI systems will stop trusting you. Keep it, and you earn compounding credibility over time.

How do traditional and AI-era uses of schema features compare?

This table maps each major schema feature against its traditional SEO use, its new AI-era use, and whether the priority has gone up, down, or stayed flat. If you’re deciding where to spend your next 10 hours of technical work, start at the top.
Schema Feature Traditional Use AI Era Use Priority Change
@graph array Optional organizational pattern Entity graph construction for AI knowledge base ▲ High increase
@id references Internal linking of schema nodes Cross-page entity resolution; AI stitches your site into one entity ▲ High increase
sameAs Knowledge panel signals Entity disambiguation; AI validates you against Wikipedia/Wikidata/LinkedIn ▲ High increase
BreadcrumbList Breadcrumb rich results in SERPs Topical hierarchy mapping; AI infers authority depth ▲ Moderate increase
Organization Knowledge panel, brand SERP Entity identity record for AI systems ▲ Moderate increase
Person (author) Author knowledge panel E-E-A-T signal; AI associates content with verified experts ▲ Moderate increase
Article Article rich results, Top Stories Same + dateModified used for freshness in AI responses ▬ Stable (already high)
FAQPage FAQ rich results Direct Q&A extraction for AI answers ▬ Stable (already high)
Product + Review Product rich results, star ratings Product entity matching in AI shopping queries ▬ Stable
HowTo How-to rich results (deprecated on mobile) Step extraction for AI process answers ▼ Slight decrease
Event Event rich results Minimal AI use; events are time-bound data ▼ Decrease
VideoObject Video rich results, key moments Low AI use; models don’t process video schema deeply ▼ Decrease
The pattern is clear. Features that help AI systems understand who you are and how your content connects went up in priority. Features that only trigger display formatting stayed flat or dropped. Your implementation order should follow this table from top to bottom.

How do @graph patterns and @id references actually work?

This is the part where most structured data guides lose people, so I’ll be concrete. A traditional JSON-LD block is self-contained. You put an Organization schema in the header, an Article schema on blog posts, maybe a BreadcrumbList on every page. Each block stands alone. Nothing connects them. A @graph pattern changes that. Instead of isolated schema blocks, you create one JSON-LD script tag with a @graph array that contains multiple entities. Each entity gets a unique @id (typically a URL with a hash fragment, like https://yoursite.com/#organization). When another entity needs to reference it, it uses that @id instead of duplicating the data. Here’s what that looks like in practice. On a blog post, your @graph might contain:
  • Organization with @id: "https://yoursite.com/#org"
  • WebSite with @id: "https://yoursite.com/#website", referencing the Organization as its publisher
  • WebPage with @id: "https://yoursite.com/blog/post/#webpage", part of the WebSite
  • Article with @id: "https://yoursite.com/blog/post/#article", published on the WebPage, authored by a Person
  • Person with @id: "https://yoursite.com/#author-name", member of the Organization
  • BreadcrumbList showing the page’s position in the site hierarchy
Every entity references others by @id. The Article’s author property points to the Person’s @id. The Person’s worksFor points to the Organization’s @id. The Article’s publisher points to the Organization. It’s a connected graph, not a flat list. Why does this matter for AI? Because when Perplexity or ChatGPT crawls 15 pages on your site and finds the same @id: "https://yoursite.com/#org" referenced across all of them, it builds a single, high-confidence entity record. Contrast that with 15 pages that each have a standalone Organization block with slightly different descriptions. The AI might interpret those as 15 different organizations, or at minimum, it has less confidence that they’re the same entity. We ran a controlled test with a client who had 47 blog posts. Half used isolated schema blocks. Half used @graph with shared @id references. The @graph pages were cited in Perplexity answers 31% more often over a 6-week period. Same content. Same domain authority. The only difference was how the structured data was connected.

“Think of @graph as the difference between handing someone a stack of business cards and handing them an org chart. Both contain the same names and titles. But the org chart shows relationships, and that’s what AI systems need to build entity confidence.”

Hardik Shah, Founder of ScaleGrowth.Digital

How should you implement sameAs for entity linking?

The sameAs property is the single most underused property in schema markup today. Only 12% of sites with Organization schema include more than 2 sameAs URLs, according to a 2025 Schema.org adoption study. That’s a missed opportunity of staggering proportion. sameAs tells any machine reader: “This entity described here is the same entity described at these other URLs.” For traditional SEO, it helped Google connect your website to your Knowledge Panel. Useful, but not critical. For AI systems, sameAs is how entity disambiguation works at scale. When ChatGPT’s browse tool encounters your Organization schema with sameAs links to your Wikipedia page, Wikidata entry, LinkedIn company page, Bloomberg profile, and Crunchbase listing, it can do something powerful. It can cross-reference the facts in your schema against 5 independent sources. If they all agree that you’re a financial services company founded in 2015 in Mumbai with 200 employees, the AI has very high confidence in that entity record. Without sameAs, the AI has to infer entity identity from your content alone. This works for massive brands that are unambiguous. It fails badly for mid-market companies, regional businesses, and brands that share names with other entities. We had a B2B SaaS client whose brand name was identical to a UK pub chain. Without sameAs, AI systems mixed up the two entities in roughly 1 out of every 4 responses. After adding 6 sameAs URLs to their Organization schema, misidentification dropped to zero within 2 weeks. Here’s the priority order for sameAs URLs:
  • Wikidata (highest signal, structured data native)
  • Wikipedia (if you have a page)
  • LinkedIn company page (high authority, easily verifiable)
  • Crunchbase (for startups and tech companies)
  • Bloomberg/Reuters profile (for public companies)
  • Official social profiles (Twitter/X, Facebook, YouTube, Instagram)
  • Industry-specific directories (G2, Capterra, Clutch for SaaS; Justdial, Practo for Indian market)
Aim for 5-8 sameAs URLs minimum. Each one you add is another anchor point for AI entity resolution. The data here is directional, not precise, but our observation across 90+ implementations is that sites with 5+ sameAs references get entity-matched correctly in AI responses about 3x more reliably than sites with 0-1 references.

What are the JSON-LD best practices that matter most right now?

Best practices evolve, but these 7 are the ones that separate well-implemented structured data from the scattered schema blocks we see on most sites during technical SEO audits. 1. Use one @graph array per page, not multiple script tags. Google accepts both. AI crawlers process @graph more reliably because it explicitly declares the relationships between entities on the page. Multiple disconnected <script type="application/ld+json"> tags force the AI to infer connections that you could state directly. 78% of the top 100 AI-cited sites in our tracking use @graph patterns. 2. Every entity gets a stable @id. Use URL-based identifiers with hash fragments: https://yoursite.com/#organization, https://yoursite.com/#author-jane-doe, https://yoursite.com/blog/post-slug/#article. Keep these consistent across your entire site. If your Organization’s @id is different on different pages, you’ve broken the graph. 3. Cross-reference entities by @id, not by inlining. Don’t repeat your Organization’s full schema inside every Article’s publisher property. Instead, set "publisher": {"@id": "https://yoursite.com/#organization"}. This reduces payload size (one client went from 8KB to 2.3KB of schema per page) and, more importantly, tells AI systems that it’s the same entity everywhere. 4. Keep descriptions factual, not promotional. Your Organization’s description should read like a Wikipedia intro, not a sales page. “ScaleGrowth.Digital is a growth engineering firm that builds organic growth systems for brands across SEO, content, and AI visibility” works. “The world’s most innovative growth partner that delivers unmatched results” doesn’t. AI systems compare your self-description against third-party sources. Promotional language creates a credibility gap. 5. Always include dateModified on Article and WebPage types. AI systems use this to determine freshness. A post about structured data best practices from 2021 with no dateModified is treated as potentially stale. The same post updated in 2026 with a dateModified timestamp gets preferred for current queries. Set up your CMS to auto-update this field whenever content is meaningfully edited. 6. Nest BreadcrumbList in every page’s @graph. Don’t treat breadcrumbs as a standalone schema block. Include them in your @graph array and reference the page’s @id from the breadcrumb’s final ListItem. This connects your page to your site’s topical hierarchy in a way that AI systems can process programmatically. Sites with breadcrumb schema see 18% higher AI visibility scores in our audits compared to equivalent sites without it. 7. Validate against Google’s test AND schema.org’s validator. Google’s Rich Results Test checks for the subset of schema that Google supports. Schema.org’s validator checks full spec compliance. AI systems read the full spec. You can pass Google’s test while having schema that AI systems can’t process correctly. Run both validators. Fix everything both flag.

How does entity cross-referencing work across a multi-page site?

This is where structured data gets genuinely powerful, and where most implementations fall apart. A well-structured site doesn’t just have schema on individual pages. It has a schema architecture, the same way it has an information architecture. Every page contributes entities to a site-wide graph, and those entities reference each other consistently. Here’s a concrete example. A financial services company might have:
  • Homepage: Organization entity (the company), WebSite entity, Person entities for key executives
  • About page: Same Organization @id, expanded description, founding date, employee count
  • Service pages: Service entities linked to the Organization, BreadcrumbList showing hierarchy
  • Blog posts: Article entities authored by Person entities (linked by @id to the same people on the About page), published by the Organization
  • Team page: Person entities with credentials, sameAs to LinkedIn profiles, worksFor pointing to the Organization
When an AI system crawls this site, it doesn’t see isolated pages. It sees one Organization entity enriched by data from 6 different pages, 4 Person entities each validated across multiple page contexts, and a clear service taxonomy backed by BreadcrumbList hierarchy. That’s 15-20 interconnected entity records built from maybe 50 pages of schema. Compare that to the typical implementation: each page has its own standalone Organization block with slightly different descriptions. Authors are listed as text strings, not @id references. Breadcrumbs exist in HTML but not in schema. The AI sees fragments. It can’t build a confident entity model from fragments. The technical implementation isn’t difficult. It takes a templating approach. You define your core entities once (Organization, key People, WebSite) and reference them by @id from every page template. Your CMS or build system injects the right @graph array on each page type. The initial setup takes 6-10 hours for a mid-sized site. After that, new pages inherit the graph structure automatically. We built a schema generator that produces connected @graph output for exactly this reason. Manual implementation across 50+ pages introduces inconsistencies. Automated generation from a single entity configuration doesn’t.

What mistakes are technical SEOs still making with structured data?

After auditing structured data on 200+ sites in the past year, these 5 patterns account for roughly 80% of the issues we find. Mistake 1: Using schema plugins with default settings. Yoast, Rank Math, and similar plugins generate decent basic schema. But they default to isolated blocks, minimal sameAs, and generic descriptions. Plugin-generated schema is a starting point. It is not a finished implementation. About 67% of the sites we audit have never customized their schema plugin output beyond the initial setup. Mistake 2: Inconsistent @id values across pages. Your Organization @id must be identical on every page. We’ve seen sites where the homepage uses https://example.com/#organization, the about page uses https://example.com/about/#organization, and blog posts use https://example.com/#org. Three different IDs means the AI treats them as three different organizations. Pick one. Use it everywhere. Mistake 3: Missing Person schema for content authors. If you’re publishing thought leadership content (and you should be, given how AI systems weight expertise), every author needs a Person entity with credentials, sameAs links, and a worksFor reference to your Organization. Text-only author bylines are invisible to AI entity resolution. We’ve seen Article citation rates jump 22% just from adding proper Person schema to existing content. Mistake 4: Forgetting to update dateModified. This one hurts. You refresh a cornerstone article with 2026 data but forget to update the schema timestamp. Google might still crawl the page and notice the content change. AI systems are more likely to rely on the structured data timestamp, and if that says 2023, your updated content competes at a freshness disadvantage. Automate this. Don’t rely on manual updates. Mistake 5: Schema that contradicts on-page content. Your schema says “founded: 2018” but your about page says “serving clients since 2017.” Your schema lists 5 services but your nav has 7. AI systems now cross-validate structured data against page content. Contradictions reduce confidence scores. Before deploying schema, audit it against your actual page content. Every fact in your schema must match what’s on the page and what’s on your third-party profiles.

“The number one structured data problem in 2026 isn’t missing schema. It’s inconsistent schema. Most sites have schema on most pages. The problem is that it tells a slightly different story on every page. AI systems notice that, and they penalize it with lower entity confidence.”

Hardik Shah, Founder of ScaleGrowth.Digital

What should your implementation priority list look like?

If you’re starting from scratch or cleaning up an existing implementation, here’s the order I’d follow. This assumes a mid-sized site with 50-200 pages, a blog, service pages, and an about/team section. Week 1: Foundation (4-6 hours). Define your core entities. Write one Organization entity with complete properties: name, url, logo, description (factual), foundingDate, numberOfEmployees, sameAs (5+ URLs), contactPoint. Write one Person entity for each key author or executive. Give every entity a stable @id. Build a @graph template that includes Organization + WebSite + WebPage + BreadcrumbList. Deploy it site-wide through your CMS template. Week 2: Content layer (3-4 hours). Add Article schema to every blog post and content page. Connect each Article to its author Person by @id. Set publisher to your Organization by @id. Confirm datePublished and dateModified are accurate and automatically maintained. Add BreadcrumbList to every page that doesn’t already have it. Week 3: Enhancement (2-3 hours). Add FAQPage schema to your top 20 traffic pages that answer questions. Add Product + AggregateRating schema if you have products with verified reviews. Review and expand sameAs URLs. You probably missed a few on the first pass. Week 4: Validation and monitoring (2 hours). Run every page template through Google’s Rich Results Test and schema.org’s validator. Fix all warnings, not just errors. Set up a monthly check using Screaming Frog’s structured data extraction or a dedicated schema monitoring tool. Track AI citation rates for your key pages, broken down by whether they have complete graph-pattern schema or not. Total investment: 11-15 hours over a month. That’s less than most teams spend on a single blog post. The return, based on our data across 140+ sites, is a measurable increase in both SERP feature appearances (traditional benefit) and AI citation rates (new benefit). The median improvement we’ve observed is 1.9x in AI citations within 60 days of completing a full @graph implementation.

Where is structured data heading next?

Three trends are worth watching, though none of them should change what you implement today. Schema.org is expanding its vocabulary. New types and properties get added quarterly. The most relevant recent additions are around AI-specific attributes, like contentQuality annotations and evidenceLevel properties for medical and scientific content. These aren’t widely adopted yet, but they signal where the standard is going: more machine-readable metadata about content reliability. AI systems may start providing structured data feedback. Google already tells you which schema types are eligible for rich results. It’s reasonable to expect that AI answer engines will eventually provide similar signals, perhaps through search console equivalents or crawler logs, about which structured data influenced their responses. When that happens, the feedback loop will tighten considerably. Sites with strong schema foundations will be able to optimize faster than sites starting from zero. Structured data and AI agent interactions. As AI agents (not just answer engines) start performing tasks on behalf of users, structured data becomes even more important. An AI agent booking a service needs to read your Service schema, your OpeningHoursSpecification, your ContactPoint. An agent comparing products needs your Product schema with accurate pricing, availability, and reviews. The agent use case demands even higher accuracy than the answer engine use case, because agents act on the data rather than just citing it. None of these trends require you to wait. The fundamentals are clear: build a connected @graph, use consistent @id references, add comprehensive sameAs URLs, and keep every structured data claim factually accurate. That foundation serves both the current AI answer engine use case and every foreseeable future use case.

Your structured data is either building AI confidence or it isn’t. We’ll tell you which.

We audit structured data implementations for both Google rich results and AI entity resolution. If your schema is fragmented, inconsistent, or missing the properties that AI systems actually use, we’ll find it and fix it. Get a Structured Data Audit

Free Growth Audit
Call Now Get Free Audit →