Desktop Starter Benchmark

desktop-starter is the first official multimodal starter benchmark (a standardized evaluation suite for measuring agent performance) family in QitOS. It is scoped as an OSWorld-compatible starter:

desktop / computer-use task structure
screenshot + a11y + OCR + UI candidates
provider-neutral GUI actions
unified BenchmarkRunResult rows
qita replay / export / visual inspection

It does not claim full official OSWorld parity yet.

Run the starter benchmark

qit bench run \
  --benchmark desktop-starter \
  --split starter \
  --strategy desktop_baseline \
  --model-name qwen-plus \
  --model-family qwen \
  --base-url https://dashscope.aliyuncs.com/compatible-mode/v1 \
  --output ./artifacts/desktop-starter.jsonl

For a deterministic local smoke run:

qit bench run \
  --benchmark desktop-starter \
  --split starter \
  --strategy desktop_smoke \
  --output ./artifacts/desktop-starter-smoke.jsonl

What the starter benchmark measures

Each result row includes the standard benchmark fields plus desktop-specific metadata:

success / stop reason
step count
action count
critic (a module that evaluates each step and can trigger retries or stops) count
token usage
latency
failure tags

Current failure taxonomy:

perception_failure
grounding_failure
planning_failure
action_selection_failure
execution_environment_failure
stop_completion_failure

What makes this an official v0.5 path

The desktop starter benchmark is the first multimodal path where all of these now line up:

benchmark tasks
baseline agent
unified runner output
trace artifacts (persistent output files from a run)
qita visual inspection
docs/tutorial story

That is the release bar for v0.5. The real benchmark adapter now lives separately under osworld.

Benchmarks

OSWorld Benchmark Adapter

​Desktop Starter Benchmark

​Run the starter benchmark

​What the starter benchmark measures

​What makes this an official v0.5 path

Desktop Starter Benchmark

Run the starter benchmark

What the starter benchmark measures

What makes this an official v0.5 path