Skip to main content

Desktop Benchmark Starter

QitOS v0.5 now has one canonical multimodal path:
  • DesktopEnv
  • the desktop-starter benchmark
  • qitos.recipes.desktop.osworld_starter
  • examples/real/openai_cua_agent.py
  • qita visual replay
This is the path to use if you want a credible starting point for computer-use research.

Why this is the release path

The desktop starter benchmark is the first path where QitOS can say:
  • one benchmark family is official
  • one baseline agent is canonical
  • one qita debugging workflow is documented
  • one artifact schema is reused end to end
That matters more than shipping several half-complete multimodal demos.

What the starter includes

  • OSWorld-inspired desktop task shape
  • screenshot-backed observations
  • optional a11y / OCR / DOM / UI candidates
  • provider-neutral GUI actions
  • planner + grounding + action selector + critic baseline loop
  • qita screenshot timeline, playback preview, and basic overlays

What it does not promise yet

  • full official OSWorld runtime parity
  • rich accessibility-tree execution across every provider
  • enterprise-grade approval governance
  • full visual replay depth planned for v0.6
The right mental model is: v0.5 = one complete starter path not v0.5 = every possible multimodal path finished When you need the real benchmark adapter rather than the starter path, move to osworld.