Run Your First Desktop Benchmark
This tutorial shows the full v0.5 desktop path:- run the official
desktop-starterbenchmark - inspect the normalized result rows
- open the run in
qita
1. Run the starter benchmark
2. Evaluate the result rows
- success rate
- stop reasons
- failure tag distribution
- average step count
3. Inspect the run in qita
- the screenshot timeline
- the current step screenshot + overlay
- the chosen desktop action
- whether grounding metadata existed
- whether the critic forced retries
--benchmark osworld.