Writing for AI Agents

How to structure and write content so AI agents can extract, summarize, and cite it accurately.

2026-02-01

The shift from writing for humans to writing for agents

Content optimized for human readers is not automatically optimized for AI agents. Humans interpret context, skim for relevance, and fill in implicit information. Agents extract and synthesize literally — they need explicit structure, clear facts, and self-contained sections.

This does not mean writing in a robotic style. Well-structured content is also more readable for humans. The goal is to eliminate ambiguity, not personality.

Core principles

1. Lead with the answer (inverted pyramid)

Place the key information at the top of each section. Agents scanning a page for a specific answer will find it immediately:

Without optimization:

Over the years, robots.txt has evolved from a simple exclusion mechanism to a nuanced protocol. Many developers wonder how to configure it for AI crawlers. This has become more important as AI crawlers have multiplied...

Optimized:

To block an AI crawler in robots.txt, add its User-agent followed by Disallow: /.

2. Use descriptive headings as standalone statements

Headings are indexed separately and used as section summaries. Make them informative enough to stand alone:

Weak: ## More information Strong: ## How to declare Content Signals in Next.js

3. Write in complete, self-contained sections

An agent may extract a single section without the surrounding context. Each section should make sense independently:

  • Define acronyms on first use within the section
  • Avoid pronouns that refer to previous sections ("As mentioned above...")
  • Include the entity name rather than "it" or "this"

4. Use lists and tables for structured data

AI agents parse lists and tables more reliably than prose for comparative or enumerable information:

Prose (harder to parse):

JSON-LD should be placed in the head section of your HTML page, it should use the correct schema type, and it must be valid JSON.

Table (easier to extract):

Rule Detail
Placement Inside <head> of the HTML page
Schema type Must match the content type (Article, Product, etc.)
Format Must be valid JSON

5. Use explicit fact statements

Agents cite specific facts. Make your facts easy to extract:

  • Use numbers and units explicitly: "Page load time under 2.5 seconds"
  • State the source and date: "According to Cloudflare (February 2026)..."
  • Avoid vague qualifiers: "quite fast" → "under 500ms"
  • Use bold to highlight key facts: Markdown reduces token usage by 60–80%

6. Define terms the first time you use them

Even if you have a glossary, define terms inline in long-form content:

RAG (Retrieval-Augmented Generation) is a technique where an LLM retrieves relevant documents from a database before generating an answer, rather than relying solely on its training data.

7. Provide context for code examples

Code snippets are valuable, but agents need prose context to understand when and why to use them:

Without context:

{ "@type": "Article", "headline": "..." }

With context:

Use the Article Schema.org type for blog posts and news articles. This tells AI crawlers the content is editorial, not a product or service page.

{ "@context": "https://schema.org", "@type": "Article", "headline": "..." }

Content length and depth

Longer content can perform better for complex topics, but only if the length is justified by substance. A 3,000-word article that could be 800 words wastes tokens in every agent's context window.

Rules:

  • Guides and tutorials: comprehensive is better; cover all cases
  • Reference pages: be complete and exhaustive (all options, all parameters)
  • Blog posts: 800–1,500 words is typically sufficient
  • Glossary terms: 200–400 words with clear definition and examples

Formatting for readability and extraction

Element Recommendation
Paragraphs 3–5 sentences max
Sentences One idea per sentence
Emphasis Bold for key terms, code for technical values
Lists Use for 3+ items that would otherwise be a comma list
Tables Use for comparisons and multi-dimensional data
Images Always add descriptive alt text

FAQs: answer the questions agents receive

Adding an FAQ section at the end of key pages addresses the exact queries users submit to AI assistants:

## Frequently Asked Questions

**What is the difference between llms.txt and robots.txt?**
`robots.txt` controls crawler access; `llms.txt` provides a structured summary 
of your site for AI systems that have already been granted access.

**Does Markdown for Agents require server changes?**
No. When enabled on Cloudflare, conversion happens at the CDN edge.

FAQ items often match voice and AI search queries verbatim. They also trigger FAQ rich results in traditional search.

Avoid these patterns

  • Keyword stuffing: agent ranking is not keyword-based
  • Buried definitions: define terms at first mention
  • Walls of text: break prose with headings every 200–300 words
  • Implicit cross-references: "as explained earlier" or "see below" — link explicitly
  • Date-free statements: "recently launched" → "launched in February 2026"