Desktop Starter Benchmark
desktop-starter is the first official multimodal starter benchmark family in QitOS.
It is intentionally scoped as an OSWorld-compatible starter:
- desktop / computer-use task structure
- screenshot + a11y + OCR + UI candidates
- provider-neutral GUI actions
- unified
BenchmarkRunResultrows - qita replay / export / visual inspection
Run the starter benchmark
What the starter benchmark measures
Each result row includes the standard benchmark fields plus desktop-specific metadata:- success / stop reason
- step count
- action count
- critic count
- token usage
- latency
- failure tags
perception_failuregrounding_failureplanning_failureaction_selection_failureexecution_environment_failurestop_completion_failure
What makes this an official v0.5 path
The desktop starter benchmark is the first multimodal path where all of these now line up:- benchmark tasks
- baseline agent
- unified runner output
- trace artifacts
- qita visual inspection
- docs/tutorial story
osworld.