robots.txt
The robots.txt file tells web crawlers which parts of your site they can access. AI systems use specific bots to crawl websites.
How to fix
Create a robots.txt file in your website's root directory:
txtUser-agent: *Allow: /
# AI CrawlersUser-agent: GPTBotAllow: /
User-agent: ClaudeBotAllow: /
User-agent: Google-ExtendedAllow: /
User-agent: PerplexityBotAllow: /
Sitemap: https://yoursite.com/sitemap.xmlIf you want to block specific AI crawlers, use Disallow: / instead.
Content Signals
Content-Signal is a new robots.txt directive, championed by Cloudflare and published at contentsignals.org , that lets you declare how AI systems may use your content. It defines three signals:
search— use in traditional search indexes (hyperlinks and excerpts).ai-input— use as grounding for AI answers (RAG, AI Overviews).ai-train— use for training or fine-tuning models.
How to fix
Add a Content-Signal line inside your User-agent block in robots.txt:
txtUser-agent: *Allow: /Content-Signal: search=yes, ai-input=yes, ai-train=yes
Sitemap: https://yoursite.com/sitemap.xmlEach signal takes yes or no. Set values that reflect your policy. The directive is advisory (like the rest of robots.txt) so compliance depends on the crawler, but declaring it publicly establishes your stated preference.
You can also scope signals per bot. For example, allow search universally but block ai-train for a specific crawler by repeating the directive under its User-agent block.
Sitemap
A sitemap helps crawlers discover all pages on your website efficiently.
How to fix
Create a sitemap.xml file in your root directory:
xml<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://yoursite.com/</loc> <lastmod>2025-01-01</lastmod> </url></urlset>Most CMS platforms (WordPress, Shopify, etc.) generate sitemaps automatically.
llms.txt
The llms.txtfile is an emerging standard that helps AI systems understand your website's purpose and structure.
How to fix
Create an llms.txt file in your root directory:
markdown# Your Site Name
> Brief description of your website
## About
A paragraph explaining what your site does.
## Key Pages
- [Home](https://yoursite.com/): Main landing page- [Products](https://yoursite.com/products): Product catalog- [Documentation](https://yoursite.com/docs): Technical docsLearn more at llmstxt.org
Markdown Content
AI agents prefer Markdown over HTML because it skips the parsing, layout, and script-execution work. The convention is content negotiation: when an agent requests your page with Accept: text/markdown, your server returns a Markdown version instead of HTML.
How to fix
Detect the Accept header and return Markdown when requested. In Next.js, a root middleware.ts handles this cleanly:
typescriptimport { NextResponse, type NextRequest } from "next/server";
const MARKDOWN = `# Your Site
> One-line description.
## Overview
A few paragraphs of Markdown describing your site.`;
export function middleware(request: NextRequest) { const accept = request.headers.get("accept") ?? ""; if (accept.includes("text/markdown")) { return new NextResponse(MARKDOWN, { headers: { "Content-Type": "text/markdown; charset=utf-8", "Cache-Control": "public, max-age=3600", Vary: "Accept", }, }); } return NextResponse.next();}
export const config = { matcher: "/" };A few guidelines:
- Serve
Content-Type: text/markdown(ortext/x-markdown) explicitly. - Always include
Vary: Acceptso caches and CDNs separate the HTML and Markdown responses. - The Markdown body should mirror your page content: a clear H1, a short summary, and structured sections with H2/H3 headings.
- On other stacks, use equivalent request-header branching (Apache
mod_rewrite, Nginxmap, or a middleware in Express, Rails, Django, etc.).
Canonical Tag
The canonical tag prevents duplicate content issues by specifying the preferred URL for a page.
How to fix
html<link rel="canonical" href="https://yoursite.com/page">Use the full, absolute URL including the protocol (https://).
JSON-LD Schema
JSON-LD provides structured, machine-readable information about your content that AI systems can easily parse.
How to fix
Add a JSON-LD script to your page's <head>:
html<script type="application/ld+json">{ "@context": "https://schema.org", "@type": "Organization", "name": "Your Company", "url": "https://yoursite.com", "description": "What your company does"}</script>Common schema types: Organization, Product, Article, FAQPage, LocalBusiness
Use Schema.org to find the right type for your content.
Schema Types
Using multiple GEO-friendly schema types helps AI categorize your content and increases the chance of being cited in relevant queries.
How to fix
Add at least two relevant JSON-LD schema types. Common GEO-friendly types:
html<!-- Organization + WebPage (good baseline for any site) --><script type="application/ld+json">{ "@context": "https://schema.org", "@graph": [ { "@type": "Organization", "name": "Your Company", "url": "https://yoursite.com" }, { "@type": "WebPage", "name": "Page Title", "description": "Page description" } ]}</script>Recommended types: Article, FAQPage, Product, HowTo, Organization, LocalBusiness, BlogPosting, BreadcrumbList, Review, Event
Open Graph Tags
Open Graph tags control how your content appears when shared on social media and used by AI for context.
How to fix
Add these meta tags to your page's <head>:
html<meta property="og:title" content="Your Page Title"><meta property="og:description" content="Description of your page"><meta property="og:image" content="https://yoursite.com/image.jpg"><meta property="og:url" content="https://yoursite.com/page"><meta property="og:type" content="website">Page Title
The <title> tag defines the title of your page. It appears in browser tabs, search results, and is used by AI systems as a primary signal to identify your content.
How to fix
Add a descriptive title to your page's <head>:
html<title>Your Page Title - Your Brand Name</title>Tips: Keep it between 30-60 characters, include your main keyword, make it unique per page, and put the most important words first.
Meta Description
The meta description provides a concise summary of your page content for search engines and AI systems.
How to fix
html<meta name="description" content="A clear, compelling description of your page. Aim for 120-160 characters.">Tips: Keep it between 120-160 characters, include main keywords naturally, make it compelling.
Semantic HTML
Semantic HTML helps AI understand your page structure and content hierarchy.
How to fix
Use semantic elements instead of generic <div> tags:
html<header> <nav><!-- Navigation --></nav></header>
<main> <article> <h1>Page Title</h1> <section> <h2>Section Heading</h2> <p>Content...</p> </section> </article></main>
<footer><!-- Footer --></footer>Key elements: <header>, <nav>, <main>, <article>, <section>, <aside>, <footer>. Use only one <h1> per page.
Heading Hierarchy
A clear heading structure (H1, H2, H3) helps AI models parse your content and understand the relationships between sections.
How to fix
Use a logical heading hierarchy on every page:
html<h1>Main Page Title</h1>
<h2>First Major Section</h2><h3>Subsection</h3><p>Content...</p>
<h2>Second Major Section</h2><h3>Another Subsection</h3><p>Content...</p>Tips: Use exactly one <h1> per page. Have at least two <h2> tags to show content breadth. Never skip heading levels (e.g. <h1> to <h3>).
Content Depth
AI models favor pages with substantial, well-structured content. Thin pages with little text are less likely to be cited in AI responses.
How to fix
Ensure your key pages have at least 300 words of meaningful content. Use lists and tables to structure information:
html<ul> <li>Feature one: description of what it does</li> <li>Feature two: description of what it does</li></ul>
<table> <thead> <tr><th>Plan</th><th>Price</th><th>Features</th></tr> </thead> <tbody> <tr><td>Basic</td><td>$9/mo</td><td>Core features</td></tr> <tr><td>Pro</td><td>$29/mo</td><td>All features</td></tr> </tbody></table>Tips: Lists and tables make data easy for AI to extract. Comparison tables are especially valuable for AI shopping and recommendation queries.
Image Alt Text
Alt text helps AI understand the content of images on your page.
How to fix
html<img src="product.jpg" alt="Blue wireless headphones with noise cancellation">Tips: Be descriptive but concise. Don't start with "Image of". Use empty alt="" for decorative images.
FAQ Content
FAQ content helps AI answer questions about your products or services directly in responses.
How to fix
1. Add FAQ sections with clear questions and answers.
2. Use FAQPage schema for structured data:
html<script type="application/ld+json">{ "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [{ "@type": "Question", "name": "What is your return policy?", "acceptedAnswer": { "@type": "Answer", "text": "We offer a 30-day money-back guarantee." } }]}</script>3. Use semantic HTML for FAQ sections:
html<section id="faq"> <h2>Frequently Asked Questions</h2> <details> <summary>What is your return policy?</summary> <p>We offer a 30-day money-back guarantee.</p> </details></section>Need Help?
If you need expert assistance implementing these optimizations, contact flowful.ai. We specialize in helping businesses optimize for the AI era.