Documentation Index
Fetch the complete documentation index at: https://qitor.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Desktop Starter Benchmark
desktop-starter is the first official multimodal starter benchmark (a standardized evaluation suite for measuring agent performance) family in QitOS.
It is scoped as an OSWorld-compatible starter:
- desktop / computer-use task structure
- screenshot + a11y + OCR + UI candidates
- provider-neutral GUI actions
- unified
BenchmarkRunResultrows - qita replay / export / visual inspection
Run the starter benchmark
What the starter benchmark measures
Each result row includes the standard benchmark fields plus desktop-specific metadata:- success / stop reason
- step count
- action count
- critic (a module that evaluates each step and can trigger retries or stops) count
- token usage
- latency
- failure tags
perception_failuregrounding_failureplanning_failureaction_selection_failureexecution_environment_failurestop_completion_failure
What makes this an official v0.5 path
The desktop starter benchmark is the first multimodal path where all of these now line up:- benchmark tasks
- baseline agent
- unified runner output
- trace artifacts (persistent output files from a run)
- qita visual inspection
- docs/tutorial story
osworld.