CLI Reference

QitOS ships two top-level CLIs:

qita for trace inspection
qit for demos, benchmarks, and developer workflows

qit demo

Use qit demo when you want the fastest path to a real model-backed QitOS run.

`qit demo minimal`

qit demo minimal

This command:

reads your OpenAI-compatible model config from env vars or flags
seeds a tiny buggy workspace
runs the minimal coding agent on that workspace
writes a qita-ready trace under ./runs

Optional flags:

--workspace ./playground/minimal_coding_agent
--logdir ./runs
--model-name Qwen/Qwen3-8B
--base-url https://api.siliconflow.cn/v1/
--api-key sk-...
--task "Fix the bug in buggy_module.py and make the verification command pass."
--max-steps 8
--render

qita

Use qita when you want to inspect traced runs.

`qita board`

qita board --logdir ./runs

Open the multi-run board. It supports:

run list and filtering
compare pickers
run detail links
replay links
raw and HTML export

`qita replay`

qita replay --run ./runs/<run_id>

Open one run in temporal playback mode.

`qita export`

qita export --run ./runs/<run_id> --html ./report.html

Export a standalone HTML artifact.

qit bench

qit bench is the canonical benchmark CLI in v0.3.

`qit bench run`

qit bench run \
  --benchmark tau-bench \
  --split test \
  --subset retail \
  --limit 10 \
  --output ./results/tau.jsonl

This command:

loads benchmark tasks
constructs RunSpec (metadata describing how a single run was configured) and ExperimentSpec (metadata grouping runs into an experiment)
produces normalized BenchmarkRunResult (a standardized result row for one benchmark task) rows

Common benchmark names now include:

desktop-starter for the canonical starter benchmark family
osworld for the benchmark-specific OSWorld adapter path
gaia, tau-bench, and cybench for the migrated benchmark families now living under qitos.benchmark.*
desktop as a compatibility alias for desktop-starter

The CLI now assumes a three-layer structure:

qitos.benchmark.* for benchmark adapters and evaluators
qitos.recipes.* for canonical baseline methods
examples/* as thin entrypoints only

`qit bench eval`

qit bench eval --input ./results/tau.jsonl --json

Aggregates normalized benchmark results.

`qit bench replay`

qit bench replay --run ./runs/<run_id>

Bridges to the qita replay surface for one benchmark run.

`qit bench export`

qit bench export --run ./runs/<run_id> --html ./report.html

Exports one benchmark run as standalone HTML.

qit skill

qit skill <subcommand>

Manages third-party skills used by QitOS-based workflows.

Recommended workflow

For a first run:

export OPENAI_API_KEY=...
qit demo minimal
qita board

For benchmark work:

qit bench run
qit bench eval
qita board
qit bench replay
qit bench export

For deeper background, continue to Official runs and Tracing.

​qit demo

​qit demo minimal

​qita

​qita board

​qita replay

​qita export

​qit bench

​qit bench run

​qit bench eval

​qit bench replay

​qit bench export

​qit skill

​Recommended workflow