Quick Start

Get AXIS running in your project. Install the CLI, initialize a config, then iterate: run, review, and baseline.

Prerequisites

Node.js 18 or later.
An API key for at least one supported agent (for example, ANTHROPIC_API_KEY for Claude Code).

1. Install the CLI

Install AXIS globally so the axis binary is available on your PATH:

npm install -g @netlify/axis

Or, to skip the install and run directly with npx, prefix any command with npx @netlify/axis (for example, npx @netlify/axis init).

2. Initialize Your Project

From your project root, run:

axis init

In an interactive terminal, this prompts you for the scenarios directory and which agents to include (comma-separated, e.g. claude-code,codex,gemini). It then creates two files:

axis.config.json -minimal config with your chosen scenarios path and agents.
scenarios/hello-world.json -a sample scenario that asks the agent to create a file with specific content.

To skip the prompts, pass flags directly:

axis init --agent claude-code,codex --scenarios ./scenarios

See the Configuration Reference for additional options like scoring weights, MCP servers, and custom agents, and Writing Scenarios for guidance on writing effective prompts and rubrics.

3. Run It

axis run

AXIS spawns the agent in an isolated workspace, captures the full interaction transcript, scores the result against your rubric, and displays a summary in your terminal.

What to Expect

The terminal displays a live progress view while the agent runs. You will see each scenario/agent combination with its current status (running, scoring, done, or failed) and a live token counter showing how many tokens the agent has consumed.

Once scoring completes, AXIS prints a summary table showing:

The composite AXIS Result (0 to 100) for each scenario/agent pair.
Breakdowns for each of the four scoring dimensions: Goal Achievement, Environment, Service, and Agent.
Score insights for any dimension that scored below 75, identifying the weakest signal.

4. View the Report

Every run saves a report to .axis/reports/. You can revisit it at any time.

# View the latest report summary
axis reports latest

# Open the HTML report in your browser
axis reports latest --html

# Get JSON output for scripting
axis reports latest --json

The HTML report includes the full scoring breakdown, interaction transcript, and judge evaluations. See Reports & Baselines for details on report contents and storage.

5. Interpret Your Results

The AXIS Result is a composite of four dimensions, each measuring a different aspect of the agent's interaction with your system.

Dimension	What It Tells You
Goal Achievement	Did the agent complete the task? Scored against your rubric checks.
Environment	How well did shell commands, file operations, and dev tools work?
Service	How effectively were APIs, MCP tools, and external services used?
Agent	How well did the agent reason and plan? Were its actions necessary and well-scoped?

A score of 50 represents median performance. Scores above 75 are good; above 90 is excellent. See Scoring Framework for the full explanation of how each dimension is calculated and why.

6. Set a Baseline

Once you have a run you are satisfied with, save it as a baseline. Future runs can diff against it to detect regressions.

# Save the latest report as a baseline
axis baseline set

# Compare future runs automatically
axis run --compare-baseline

The comparison exits with code 1 if any regressions are detected, making it suitable for CI gating. See Reports & Baselines for baseline workflows.

Gitignore

Add .axis/reports/ and .axis/skills-cache/ to your .gitignore. Baselines (.axis/baselines/) are designed to be committed so the whole team shares the same regression thresholds.

Next Steps

Scoring Framework -how the four dimensions are calculated, what signals drive each score, and why the scoring works the way it does.
Writing Scenarios -how to write effective prompts, design rubrics, and use setup/teardown actions.
Execution & Agents -how AXIS runs scenarios, supported and custom agents, and workspace isolation.