Skip to main content
QitOS ships two top-level CLIs:
  • qita for trace inspection
  • qit for demos, benchmarks, and developer workflows

qit demo

Use qit demo when you want the fastest path to a real model-backed QitOS run.

qit demo minimal

qit demo minimal
This command:
  • reads your OpenAI-compatible model config from env vars or flags
  • seeds a tiny buggy workspace
  • runs the minimal coding agent on that workspace
  • writes a qita-ready trace under ./runs
Optional flags:
  • --workspace ./playground/minimal_coding_agent
  • --logdir ./runs
  • --model-name Qwen/Qwen3-8B
  • --base-url https://api.siliconflow.cn/v1/
  • --api-key sk-...
  • --task "Fix the bug in buggy_module.py and make the verification command pass."
  • --max-steps 8
  • --render

qita

Use qita when you want to inspect traced runs.

qita board

qita board --logdir ./runs
Open the multi-run board. It supports:
  • run list and filtering
  • compare pickers
  • run detail links
  • replay links
  • raw and HTML export

qita replay

qita replay --run ./runs/<run_id>
Open one run in temporal playback mode.

qita export

qita export --run ./runs/<run_id> --html ./report.html
Export a standalone HTML artifact.

qit bench

qit bench is the canonical benchmark CLI in v0.3.

qit bench run

qit bench run \
  --benchmark tau-bench \
  --split test \
  --subset retail \
  --limit 10 \
  --output ./results/tau.jsonl
This command:
  • loads benchmark tasks
  • constructs RunSpec and ExperimentSpec
  • produces normalized BenchmarkRunResult rows
Common benchmark names now include:
  • desktop-starter for the canonical starter benchmark family
  • osworld for the benchmark-specific OSWorld adapter path
  • gaia, tau-bench, and cybench for the migrated benchmark families now living under qitos.benchmark.*
  • desktop as a compatibility alias for desktop-starter
The CLI now assumes a three-layer structure:
  • qitos.benchmark.* for benchmark adapters and evaluators
  • qitos.recipes.* for canonical baseline methods
  • examples/* as thin entrypoints only

qit bench eval

qit bench eval --input ./results/tau.jsonl --json
Aggregates normalized benchmark results.

qit bench replay

qit bench replay --run ./runs/<run_id>
Bridges to the qita replay surface for one benchmark run.

qit bench export

qit bench export --run ./runs/<run_id> --html ./report.html
Exports one benchmark run as standalone HTML.

qit skill

qit skill <subcommand>
Manages third-party skills used by QitOS-based workflows. For a first run:
  1. export OPENAI_API_KEY=...
  2. qit demo minimal
  3. qita board
For benchmark work:
  1. qit bench run
  2. qit bench eval
  3. qita board
  4. qit bench replay
  5. qit bench export
For deeper background, continue to Official runs and Tracing.