AI Agents Are Your New Users - A 2026 CLI Checklist

It’s 2026 and nobody’s surprised anymore - most of us spend our days working alongside AI agents in the terminal. Claude Code, Cursor, Copilot - they’re shelling out to CLI tools, reading the output, and deciding what to do next. We watch it happen in real time.
And I keep seeing the same thing: the agent tries a command, gets output it can’t parse, tries a different flag, gets an error it doesn’t understand, pulls up --help, tries again, loops three more times, and eventually stumbles into the right incantation. Sometimes it gives up and tries to scrape a GitHub Actions workflow for clues about how the tool works.
The tools aren’t broken. But they weren’t built for this “user.”
Your CLI Is Already Being Called by Agents
The numbers confirm what I’m seeing in my terminal every day. The Model Context Protocol has 5,800+ servers - most of them wrapping CLI tools. MCP SDKs hit 97 million monthly downloads one year after launch. AI coding assistants - Claude Code, Cursor, GitHub Copilot, Windsurf - are the fastest-growing category of developer tools, and they all execute CLI commands as their primary way of interacting with the system.
Every CLI tool is now potentially an AI tool. The question isn’t whether agents will call yours - it’s whether your tool makes that easy or painful.
At some point, watching all this, the question turned inward: are the CLI tools I write going to be just as broken for agents? I’ve spent the last decade studying how to build good CLIs for humans. I follow clig.dev. I read the Better CLI guides. I know the craft.
But when I looked for equivalent guidelines for agents - not user experience, but agent experience - there was nothing. No checklist. No best practices. No “here’s what your tool needs so an AI agent can use it without looping five times.”
So I started collecting my thoughts. Here’s what I have so far.
Speak the Agent’s Language: Markdown
Before the checklist, let’s settle the most basic question: when your tool talks to an agent, what format should the output be?
The instinctive answer is JSON. It’s structured, it’s parseable, every API speaks it. But here’s what people miss: AI agents don’t json.loads() your output. The LLM reads it as text in its context window. It reasons about it the way you’d read a document - scanning, extracting meaning, deciding what to do next.
And what format are LLMs most fluent in? Markdown. They’re trained on enormous amounts of it. They produce it as output. Tables, headers, lists, code blocks - this is their native tongue. When your tool outputs a markdown table, the agent reads it as naturally as you read a spreadsheet.
import sys
def list_services(services):
if sys.stdout.isatty():
# Human: pretty table with colors and alignment
print_fancy_table(services)
else:
# Agent: markdown table - structured, readable, token-efficient
print("| Name | Status | Last Run |")
print("|------|--------|----------|")
for s in services:
print(f"| {s['name']} | {s['status']} | {s['last_run']} |")
Why not the other formats?
JSON is verbose. Every field name is repeated for every record. Brackets, quotes, commas, escaping - all of it eats tokens and adds noise the agent has to read through. A JSON array of 20 objects is significantly larger than the same data in a markdown table. JSON also requires the agent to know field names upfront - if it guesses wrong, it gets nothing. And deeply nested JSON is hard even for humans to reason about; agents struggle with it too.
Plain text is what most tools output by default, and it’s a mess. ANSI color codes, box-drawing characters, dynamic column widths that shift depending on data length. No two tools format their tables the same way. An agent parsing ls -la output is doing string surgery that breaks the moment the format changes.
YAML is whitespace-sensitive and type-ambiguous. Is yes a boolean or a string? Is 3.0 a float or a version number? YAML’s implicit typing has caused production incidents for humans; agents have the same problems. It’s also rarely used as CLI output.
Markdown hits the sweet spot: structured enough for agents to extract data, readable enough for humans to scan, and token-efficient enough to not waste context window. For streaming output, you can emit markdown line by line - each table row is self-contained.
Keep a --json flag for programmatic piping where another program (not an agent) needs to parse the output. But when an AI agent is your user, markdown is the better default.
Want to save the rainforest? Fewer tokens means less compute, less energy, more trees. Markdown gets you there.
The Checklist: Quick Wins
These are the foundation. Each one makes your tool dramatically more useful to agents - and to humans scripting with it.
Auto-Detect and Switch Formats
Don’t make the agent ask for a different format. Detect that you’re not talking to a human and just do the right thing.
If stdout isn’t a TTY, you’re being called by an agent or piped into something. Switch to markdown (or JSON if --json was passed). Skip the colors. Drop the spinners. Many tools already do part of this - ls shows one file per line when piped, grep drops colors, git adjusts its pager. Take it one step further: switch to a fully agent-friendly format.
import sys
def output(results, json_flag=False):
if json_flag:
# Explicit --json flag: for programmatic piping to other tools
print(json.dumps(results))
elif not sys.stdout.isatty():
# Agent or pipe: markdown by default
print_markdown_table(results)
else:
# Human at a terminal: pretty tables, colors, progress bars
print_fancy_table(results)
Detecting Agents: What’s Available Today
Beyond TTY checks, agents are starting to announce themselves. Each major CLI agent sets its own environment variable in child processes:
- Claude Code:
CLAUDECODE=1 - Gemini CLI:
GEMINI_CLI=1 - OpenAI Codex:
CODEX_SANDBOX_NETWORK_DISABLED=1 - CI systems:
CI=true(well-established convention)
There’s no universal AI_AGENT=1 standard yet - and honestly, you don’t need one. The TTY check catches all of them plus every pipe and script. Use it as your primary signal. The agent-specific variables are useful if you want to tailor behavior per agent, but for most tools, isatty() is enough.
import os
import sys
def detect_output_format():
if not sys.stdout.isatty():
# Non-interactive: agent, pipe, or script → markdown
return "markdown"
elif os.environ.get("CLAUDECODE") or os.environ.get("GEMINI_CLI"):
# Interactive but inside an agent (rare edge case) → markdown
return "markdown"
else:
# Human at a terminal → pretty colors and formatting
return "pretty"
This will evolve. A year from now, there might be a standard. For today, TTY detection plus a --json flag covers you.
Meaningful Exit Codes
This is already in the clig.dev and Better CLI guidelines - and for good reason. But with agents, the stakes are higher. A human sees “Error” on screen and knows something failed regardless of the exit code. An agent decides what to do next solely based on your exit code. If you always exit 0 even on failure, the agent thinks everything is fine and proceeds with garbage data.
import sys
def deploy(env: str):
if env not in available_environments:
# Good: print error to stderr, exit non-zero
print(f"Error: environment '{env}' not found", file=sys.stderr)
sys.exit(1)
# Bad: printing error but exiting 0
# The agent thinks this succeeded and proceeds with garbage data
# print(f"Error: environment '{env}' not found")
# sys.exit(0) # DON'T DO THIS
Use standard conventions: 0 for success, 1 for general errors, 2 for usage errors (wrong flags, missing arguments). Be consistent. Document them if you use custom codes.
The anti-pattern: tools that print “ERROR: something failed” to stdout and exit 0. Agents miss the error entirely.
Stderr vs Stdout Separation
Another one straight from the existing guidelines that becomes critical with agents. Humans can visually filter noise from data. Agents can’t. They read stdout for data and ignore stderr. If you mix progress bars, warnings, and spinners into stdout, the agent can’t tell what’s data and what’s noise.
The rule is simple: data goes to stdout. Everything else - progress indicators, warnings, debug info, spinners, color output - goes to stderr.
import sys
import json
def export_data(users):
# Progress and status go to stderr - agents ignore it
print("Exporting...", file=sys.stderr)
# Data goes to stdout - this is what the agent reads
print(json.dumps({"users": len(users), "exported": True}))
# BAD: mixing progress into stdout breaks parsing
# print("Exporting... [████████░░] 80%") # agent tries to parse this as data
# print(json.dumps({"users": len(users), "exported": True}))
Clear –help Output
AI agents read --help to learn how to use your tool. Claude Code literally runs yourtool --help before trying to use it for the first time. Your help text is your tool’s documentation for agents.
Every command and flag needs a one-line description. Group related flags. Show examples. Keep it scannable.
Usage: mytool <command> [flags]
Commands:
list List all services
deploy Deploy a service to an environment
logs Stream logs from a running service
Flags:
--json Output as JSON
--verbose Show detailed output
--env ENV Target environment (default: production)
--help Show this help
Examples:
mytool list --json
mytool deploy api-server --env staging
mytool logs api-server --follow
Make sure --help works on subcommands too: mytool deploy --help should show deploy-specific flags and examples. Agents drill down this way - top-level help to discover commands, subcommand help to learn how to use them. This is progressive disclosure that actually works.
If an agent can’t understand your --help, it’ll guess. Guesses lead to errors. Errors lead to retry loops. Good help text prevents all of that.
The Checklist: Medium Effort
These take more thought but make your tool significantly better for both agents and power users.
Parseable Error Messages
When your tool fails, the agent needs to understand what went wrong and what to fix. “Error: file not found: /path/to/config.yaml” is actionable. “Something went wrong :(” is not.
Errors should be markdown too - with a clear heading and structured details:
import sys
def deploy_error(env: str, available: list[str]):
if sys.stdout.isatty():
# Human mode: readable, with hints
print(f'Error: Environment "{env}" not found.')
print(f"Available environments: {', '.join(available)}")
print(f"Hint: create it with 'mytool env create {env}'")
else:
# Agent mode: markdown with structure and next steps
print(f"# Error: Environment \"{env}\" not found\n")
print(f"**Available environments:** {', '.join(available)}\n")
print(f"## What to do next\n")
print(f"- Create the environment: `mytool env create {env}`")
print(f"- List all environments: `mytool env list`")
print(f"- Check your config file: `~/.mytool/config.yaml`")
print(f"- See docs: `mytool env --help`")
sys.exit(1)
The markdown heading tells the agent immediately this is an error. The “What to do next” section is the key - give the agent a menu of options to recover. Don’t just say what went wrong; lay out every path forward. The agent picks the most relevant one, runs it, and keeps going. No human intervention needed.
Consistent Flag Conventions
Agents learn patterns across tools. If every tool uses --output json for structured output, the agent tries that first on your tool. If your tool uses --format=json while another uses -j and another uses --json, the agent has to read help text every time.
Here’s the nuance: follow what’s common in the wild, not just what a style guide says. AI models are trained on real-world code and documentation - millions of CLI invocations, README examples, Stack Overflow answers, CI scripts. They’ve internalized what most tools actually do, not what a conventions document recommends they should do. If 90% of tools in your ecosystem use --verbose and the style guide says --debug, the agent will try --verbose first.
Do the research. Check what the dominant tools in your space use. Match those patterns. Written conventions like clig.dev are a good starting point, but well-established practice beats prescribed conventions when they diverge.
The Goldilocks Problem
Here’s something I keep running into: tools that give agents either too much or too little.
Take GitHub’s gh CLI. It’s a well-built tool - works great for common cases. But watch an agent try to use it for something slightly off the beaten path. The default output is human-formatted tables that agents can’t parse. Switch to --json and you get structured data, but now you need to know the exact field names upfront. The agent guesses wrong, gets an error, tries --help, finds a wall of text, picks different fields, and loops until it stumbles into the right combination.
The missing piece is progressive disclosure. A tool that lets the agent start simple and expand its understanding incrementally - the way you’d explore a new API. First, give me the basics. Then let me ask for more detail on the parts I care about. Don’t dump everything or hide everything.
No tool I’ve seen gets this fully right yet. But the ones that come closest share a pattern: structured output with discoverable fields, consistent error messages that name what’s available, and help text organized by task rather than alphabetically by flag.
Scriptable Confirmation Prompts
“Are you sure? (y/n)” blocks agents completely. They can’t type “y”. Your tool hangs forever waiting for input that will never come.
If your tool has destructive operations that need confirmation, provide a --yes or --force flag to skip the prompt:
import sys
def delete_service(name: str, yes: bool = False):
if sys.stdin.isatty() and not yes:
# Human mode: ask for confirmation
confirm = input(f"This will permanently delete {name}. Are you sure? (y/n) ")
if confirm.lower() != "y":
sys.exit(0)
elif not sys.stdin.isatty() and not yes:
# Agent/pipe mode without --yes: refuse, don't hang
print(f"Error: destructive operation requires --yes flag "
f"in non-interactive mode", file=sys.stderr)
sys.exit(1)
# Proceed with deletion
do_delete(name)
print(f"Deleted {name}.")
The best pattern: interactive prompts for humans by default, --yes flag for automation, and a clear error in non-TTY mode if --yes isn’t provided. Don’t silently hang. Don’t silently proceed.
The Checklist: Deeper Investment
These take more thought, but they’re not as heavy as you might think.
MCP Server Wrapper
There’s an active debate about whether MCP servers are even necessary for CLI tools. If your tool already outputs clean markdown, has proper exit codes, and handles non-TTY detection - do you really need another layer? Agents are already calling your CLI directly and doing fine.
Fair point. But wrapping your CLI in an MCP server isn’t the heavy investment it sounds like. The TypeScript and Python MCP SDKs make it straightforward - each command becomes a tool with a name, description, and typed parameters. The MCP server calls your CLI under the hood. You’re not rewriting anything.
# Without MCP: agent constructs a command string and parses stdout
# subprocess.run(["mytool", "deploy", "--service", "api", "--env", "staging"])
# With MCP: agent calls a typed function directly
result = deploy(service="api", env="staging", version="v2.1.0")
# result = {"status": "success", "url": "https://..."}
The real benefit is discoverability. An MCP server tells agents exactly what your tool can do, what parameters each command takes, and what to expect back - without the agent having to run --help and parse the output first.
Depending on your ecosystem, you might also consider providing a skill - a set of instructions that teaches the agent how to use your tool effectively. In Claude Code, for example, a skill is essentially a prompt that explains your tool’s workflows, common patterns, and gotchas. It’s like onboarding documentation, but for the agent. Where an MCP server gives the agent a structured API to call, a skill gives it the judgment to know when and how to call it well.
This space is evolving fast. Keep an eye on it. But if you’re building a tool that agents will use regularly, the cost of adding an MCP wrapper or a skill is low and the upside is real.
Full Schema as Part of the Skill
Beyond --help, agents benefit from knowing your tool’s complete API surface upfront. A --schema flag that dumps all commands, flags, types, and examples is one way to do this - but there’s a simpler approach: include the schema directly in the skill.
If you’re already providing a skill that teaches the agent how to use your tool, put the full schema right there. Every command, every flag, every default, every example. The agent loads it once and doesn’t need to run --help on each subcommand or guess at flag names.
# mytool v2.1.0
## Commands
### `mytool deploy`
Deploy a service to an environment.
| Flag | Type | Required | Default | Description |
|------|------|----------|---------|-------------|
| `--service` | string | yes | | Service name |
| `--env` | string | no | production | Target environment |
| `--version` | string | yes | | Version to deploy |
**Examples:**
- `mytool deploy --service api --env staging --version v2.1.0`
- `mytool deploy --service worker --version v2.0.0`
Tools like cobra (Go), clap (Rust), and click (Python) can auto-generate this from your flag definitions. Dump it as markdown, drop it in a skill, and the agent has your complete API surface before it runs a single command.
Predictable Output Schemas
If your --json output changes shape between versions, agents break. An agent that learned to read {"status": "ok", "items": [...]} in v1 fails silently when v2 returns {"result": {"status": "success", "data": [...]}}.
Treat your JSON output like an API contract:
- Don’t remove or rename fields without a major version bump
- Add new fields freely - agents ignore what they don’t recognize
- Document the schema somewhere agents can find it
- Consider a
--output-versionflag for backwards compatibility during transitions
This is API design applied to CLI output. The same discipline, a different surface.
Plugin and Extension Architecture
If your tool is extensible, agents can discover and add capabilities to it. Think git - its plugin model means tools like git-lfs slot in seamlessly. An agent can discover and use extensions without special handling.
Convention-based plugin discovery works well: yourtool-* binaries on PATH that follow the same output conventions as the core tool. The agent runs yourtool help and discovers all available subcommands, including plugins.
Only invest here if your tool genuinely benefits from extensibility. Don’t add a plugin system for the sake of it. Most tools don’t need this.
Start With Markdown and Exit Codes
You don’t need to do all of this today.
Start with markdown output in non-TTY mode and proper exit codes. That alone transforms your tool from agent-hostile to agent-friendly. Work through the medium effort items as you ship new versions. The deeper investments make sense when your tool is becoming infrastructure that agents depend on.
Here’s the thing that keeps hitting me: watching agents fumble through the terminal reminds me of myself twenty years ago. The same mistakes. Trying random flags. Misreading error output. Not understanding what a tool is telling you. Looping until something works by accident.
I was a teenager becoming dangerous in the terminal. These agents are doing the same thing - except there are millions of them, they work around the clock, and they learn patterns across every tool they touch.
We’ve spent decades building guidelines for CLI user experience. clig.dev and Better CLI are excellent - they codify what makes a command-line tool good for humans. But the users have changed. We need the same rigor applied to agent experience.
In the 2000s, your CLI needed to play nice with shell scripts. In the 2010s, it needed to work in CI/CD pipelines. In 2026, it needs to play nice with AI agents. Same principle every time: be a good citizen in the pipeline.
The pipeline just got smarter. The guidelines need to catch up.
Building a website? I wrote a similar checklist for making websites agent-friendly. Same principle, different surface.
I’ve been writing code for 20 years and building CLI tools for most of them. Let’s connect if you’re thinking about agent-friendly tooling too.