Quick Start

Get AXIS running in your project. Install the CLI, initialize a config, then iterate: run, review, and baseline.

Prerequisites

1. Install the CLI

Install AXIS globally so the axis binary is available on your PATH:

npm install -g @netlify/axis

Or, to skip the install and run directly with npx, prefix any command with npx @netlify/axis (for example, npx @netlify/axis init).

2. Initialize Your Project

From your project root, run:

axis init

In an interactive terminal, this prompts you for the scenarios directory and which agents to include (comma-separated, e.g. claude-code,codex,gemini). It then creates two files:

To skip the prompts, pass flags directly:

axis init --agent claude-code,codex --scenarios ./scenarios

See the Configuration Reference for additional options like scoring weights, MCP servers, and custom agents, and Writing Scenarios for guidance on writing effective prompts and rubrics.

3. Run It

axis run

AXIS spawns the agent in an isolated workspace, captures the full interaction transcript, scores the result against your rubric, and displays a summary in your terminal.

What to Expect

The terminal displays a live progress view while the agent runs. You will see each scenario/agent combination with its current status (running, scoring, done, or failed) and a live token counter showing how many tokens the agent has consumed.

Once scoring completes, AXIS prints a summary table showing:

4. View the Report

Every run saves a report to .axis/reports/. You can revisit it at any time.

# View the latest report summary
axis reports latest

# Open the HTML report in your browser
axis reports latest --html

# Get JSON output for scripting
axis reports latest --json

The HTML report includes the full scoring breakdown, interaction transcript, and judge evaluations. See Reports & Baselines for details on report contents and storage.

5. Interpret Your Results

The AXIS Result is a composite of four dimensions, each measuring a different aspect of the agent's interaction with your system.

Dimension What It Tells You
Goal Achievement Did the agent complete the task? Scored against your rubric checks.
Environment How well did shell commands, file operations, and dev tools work?
Service How effectively were APIs, MCP tools, and external services used?
Agent How well did the agent reason and plan? Were its actions necessary and well-scoped?

A score of 50 represents median performance. Scores above 75 are good; above 90 is excellent. See Scoring Framework for the full explanation of how each dimension is calculated and why.

6. Set a Baseline

Once you have a run you are satisfied with, save it as a baseline. Future runs can diff against it to detect regressions.

# Save the latest report as a baseline
axis baseline set

# Compare future runs automatically
axis run --compare-baseline

The comparison exits with code 1 if any regressions are detected, making it suitable for CI gating. See Reports & Baselines for baseline workflows.

Gitignore

Add .axis/reports/ and .axis/skills-cache/ to your .gitignore. Baselines (.axis/baselines/) are designed to be committed so the whole team shares the same regression thresholds.

Next Steps