AI Agents Are Your New Users - A 2026 Checklist

Feb 12, 2026 · 10 min read

Your website has users you’ve never seen in analytics.

They don’t click. They don’t scroll. They read your entire site in seconds and decide if it’s useful. They’re AI agents - and in 2026, they’re everywhere.

Below is a practical checklist for serving them well. But first - why should you care?

This Is Already Happening

This isn’t a prediction. The data is clear.

Automated bot traffic surpassed human traffic for the first time in 2024, hitting 51% of all web traffic. AI crawlers specifically now generate 50 billion requests per day on Cloudflare’s network alone. GPTBot requests surged 305% year-over-year. ChatGPT-User - the bot that fetches pages when a human asks ChatGPT about something - saw requests jump 2,825%.

And in January 2026, Chrome shipped autonomous agent browsing powered by Gemini to its 3 billion users. An AI can now browse the web on your behalf - clicking, scrolling, filling out forms.

These aren’t scrapers. They’re users. They browse on behalf of humans, summarize your content, answer questions about your product, decide whether to recommend you.

Meanwhile, traditional search is shifting under your feet. Google AI Overviews now reduce clicks by 58%. Search traffic to publishers dropped by a third globally in 2025. Zero-click searches increased from 56% to 69%.

The way people discover your website is changing. AI agents are part of that discovery layer now.

The Good News

Here’s what nobody’s talking about: AI traffic is better traffic.

AI referral visits convert at 14.2% compared to Google’s 2.8% - five times higher. Claude referrals convert at 16.8%. ChatGPT at 14.2%. Perplexity at 12.4%. When AI agents send someone to your site, that person actually wants what you have.

AI referral traffic hit 1.13 billion visits in June 2025, a 357% increase from the year before. Still a fraction of Google’s volume - but growing fast, and converting much better.

The ecosystem powering these agents is exploding. The Model Context Protocol (MCP) - the standard for how AI agents use tools - hit 97 million monthly SDK downloads one year after launch. There are 5,800+ MCP servers and 300+ clients. OpenAI, Google, and Microsoft all adopted it. In December 2025, MCP was donated to the Linux Foundation.

So how do you serve these new users? Most of it is stuff you should already be doing.

The Checklist: Quick Wins

These take less than an hour each. Start here.

Semantic HTML

AI agents parse your DOM. A <nav>, <article>, <aside> structure tells them what’s content and what’s chrome. A soup of nested <div>s forces them to guess.

You’re probably already doing this. Just audit your templates: are you using <article> for posts? <nav> for navigation? <header> and <footer> properly?

<!-- Agents can work with this -->
<article>
  <h1>Your Title</h1>
  <p>Your content...</p>
</article>

<!-- Agents struggle with this -->
<div class="post-wrapper">
  <div class="title-container">
    <span class="heading">Your Title</span>
  </div>
  <div class="body-text">Your content...</div>
</div>

This is also an accessibility win. Screen readers use the same structure AI agents do. Two user segments served with one practice.

Meta Descriptions and Open Graph

When an AI agent summarizes your page, it looks for the description meta tag first. No description means the agent picks a random paragraph. Bad Open Graph data means bad summaries when your link is shared via AI tools.

Every page needs a unique description and og:description. They don’t have to be the same text - description is for search, og:description is for social sharing and AI summaries.

<meta name="description"
  content="A practical checklist for making sites agent-friendly.">
<meta property="og:description"
  content="Your website has new users and they're not human.">

On my blog, every post has both in the frontmatter. Hugo renders them into the <head>. Whatever framework you use, make sure these aren’t blank.

RSS Feeds

Agents subscribe to feeds to stay current. A well-structured RSS feed is the easiest way to tell agents “here’s everything I’ve published, structured and ready to consume.”

Most static site generators include RSS. The key: make sure your feed includes full content, not just excerpts. An excerpt forces the agent to make another request to get the actual content. Full content in the feed means one request, done.

Sitemap

Your sitemap is the map agents use to discover all your pages. Without it, they crawl links and hope they find everything.

Make sure yours is at /sitemap.xml and referenced in your robots.txt:

Sitemap: https://yoursite.com/sitemap.xml

Hugo, Next.js, and most frameworks generate one automatically. Check that it exists and includes all the pages you want discovered.

llms.txt

The llms.txt specification is an emerging standard for telling AI models about your site. It’s a plain text file at /llms.txt that describes your site in a format optimized for language models.

I should be honest here: adoption is early. About 844,000 sites have one, concentrated mostly in developer tools and AI companies. Google rejected it. No major LLM provider has committed to using it. Server logs show AI crawlers rarely request it.

But it costs you five minutes. I generate mine automatically from my Hugo config - it lists my site structure and recent posts in plain text. If crawlers start using it, I’m ready. If they don’t, I lost five minutes.

# Rok Garbas

> I build products and help customers understand
> why they matter.

## Posts

- [Post Title](https://garbas.si/posts/post-slug/)
- [Another Post](https://garbas.si/posts/another/)

The Checklist: Medium Effort

These take a few hours to a day. Do them as you ship updates.

Structured Data / Schema.org

This is how agents understand what your content is, not just what it says. An Article schema tells them the author, publish date, and topic. A Product schema tells them the price, availability, and reviews. Without it, agents have to infer everything from the text.

JSON-LD in your <head> is the standard approach:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "AI Agents Are Your New Users",
  "author": {
    "@type": "Person",
    "name": "Rok Garbas"
  },
  "datePublished": "2026-02-10",
  "description": "A practical checklist..."
}
</script>

Start with Article for blog posts and Organization for your homepage. Add more types as they become relevant. Google’s Rich Results Test validates your markup and is useful even if your goal isn’t Google - the structure helps all agents.

robots.txt for AI Crawlers

You have new visitors. You need a policy for them.

Currently only 5% of domains block GPTBot and 4% block ClaudeBot. Most sites haven’t thought about this at all. You should.

The key distinction: training crawlers vs user-action crawlers. GPTBot crawls to train OpenAI’s models. ChatGPT-User fetches pages because a human asked about something. You might want to block the first but allow the second.

# Block training crawlers
User-agent: GPTBot
Disallow: /

# Allow user-action fetching
User-agent: ChatGPT-User
Allow: /

# Allow Anthropic's user-action bot
User-agent: ClaudeBot
Allow: /

# Allow search-focused bots
User-agent: PerplexityBot
Allow: /

This is a business decision, not a technical one. If you want your content cited in AI answers, allow the bots. If you want to protect your content from training datasets, block the training crawlers selectively. There’s no universally right answer - but having no policy means you’ve made the decision by default.

Clean Content Structure

Agents read your headings as a table of contents. Clear hierarchy - h1 to h2 to h3 - means they can navigate directly to the relevant section. Skip levels or use headings for styling and the agent gets confused about what’s a section and what’s a subsection.

One h1 per page. Sequential nesting. Headings describe content, not visual style. This is the same advice accessibility experts have been giving for twenty years. The difference is that now millions of AI agents enforce the same expectation.

Content in Multiple Formats

HTML is for browsers. Agents prefer plain text or markdown. Offering both means agents get clean content without parsing your CSS classes, ad containers, cookie banners, and navigation elements.

On my blog, Hugo outputs each page in both HTML and Markdown:

[outputs]
page = ['HTML', 'Markdown']

For other frameworks: consider an API endpoint that returns your content as JSON or plain text. Even a simple /posts/my-post/index.md alongside the HTML version gives agents a clean source.

The Checklist: Deeper Investment

These are bigger commitments. They make sense when you’re building a product or service that agents should interact with, not just read.

API Endpoints Alongside Pages

A human reads your pricing page. An agent wants to query your pricing programmatically. If your site is a product or service, having a public API - even read-only - means agents can integrate your data into workflows instead of scraping your marketing copy.

Start with a simple JSON endpoint for your most-requested data. Product information, documentation index, pricing tiers. REST is fine. Don’t overthink it.

This mostly applies to products and services. If you’re running a content site, the markdown output format from the previous section is enough.

MCP Server for Your Service

This is the leap from “AI can read about my product” to “AI can use my product.”

An MCP server wraps your service into tools that agents call directly - with typed parameters, descriptions, and structured responses. When a user asks Claude “check my deployment status on ServiceX,” the agent calls your MCP tool instead of navigating your dashboard and scraping the screen.

The TypeScript and Python SDKs are mature. Each API endpoint becomes a tool with a name, description, and JSON schema for inputs and outputs.

This makes sense for developer tools, SaaS products, and anything with an API. It’s overkill for a blog.

Agent-Friendly Authentication

Some content is behind auth. Agents need to access it too - on behalf of the user.

OAuth 2.0 with proper scopes lets an agent act as the user without sharing passwords. API keys work for service-to-service access. The pattern: public content should be fully accessible without auth. Premium or user-specific content needs auth that agents can handle programmatically.

The anti-pattern: CAPTCHAs and browser fingerprinting that block all non-human access. These don’t distinguish between agents acting for your users and actual abuse. If your auth blocks the agent, it blocks the user who sent the agent.

Rate Limiting That Distinguishes Agents from Abuse

A blanket rate limit treats ClaudeBot the same as a DDoS attack. You need categories.

Set generous limits for verified AI crawlers - they identify themselves with known user-agent strings. Set strict limits for unidentified bots. Consider tools like Cloudflare’s AI Audit that give you visibility into which bots are accessing your site and how often.

The balance: too strict and agents can’t index you. Too loose and you’re subsidizing someone else’s training run with your bandwidth.

Start This Week

You don’t need to do all of this today.

Start with the quick wins: semantic HTML, meta descriptions, RSS, sitemap. That’s an afternoon. Then work through the medium effort items as you ship updates. The deeper investments make sense when you’re building a product that agents should interact with, not just read about.

In 2010, you needed a mobile-friendly website. In 2026, you need an agent-friendly one. The good news? Most of what agents want is what good engineering looks like anyway. Clean markup, structured data, multiple output formats, clear documentation. You were supposed to do this stuff already.

Now you have a new reason. And the data to back it up.

Building a CLI tool? I wrote a similar checklist for making CLI tools agent-friendly. Same principle, different surface.

I’ve been writing code for 20 years and adapting to AI tooling for the past year. Let’s connect if you’re navigating this shift too.