Inspect a GUI Failure in qita

Desktop failures are rarely just parser failures (a parser converts raw model output into a typed Decision). Usually the cause is more nuanced. You need to answer:

what the model saw
what it tried to click or type
whether grounding existed
whether the critic (a module that evaluates each step and can trigger retries or stops) rejected a weak action

Open the run

qita replay --run ./runs/<run_id>

Or open the board and click into the run:

qita board --logdir ./runs

What to inspect first

1. Visual timeline

Check the screenshot timeline first. It tells you:

which steps had screenshots
which action each step took
whether grounding metadata existed
how many critic retries happened

2. Replay preview

In replay mode, inspect the screenshot preview and overlay:

the screenshot itself
the action point or overlay box
the step phase currently being replayed

3. Failure tags

Then look at benchmark output / manifest summaries and ask which bucket fits best:

perception_failure
grounding_failure
planning_failure
action_selection_failure
execution_environment_failure
stop_completion_failure

This keeps GUI debugging tied to a benchmark-grade taxonomy instead of vague impressions.

Run Your First Desktop Benchmark

​Inspect a GUI Failure in qita

​Open the run

​What to inspect first

​1. Visual timeline

​2. Replay preview

​3. Failure tags