Inspect a GUI Failure in qita
Desktop failures are usually not “just parser failures”. You need to answer:- what the model saw
- what it tried to click or type
- whether grounding existed
- whether the critic rejected a weak action
Open the run
What to inspect first
1. Visual timeline
Check the screenshot timeline first. It tells you:- which steps had screenshots
- which action each step took
- whether grounding metadata existed
- how many critic retries happened
2. Replay preview
In replay mode, inspect the screenshot preview and overlay:- the screenshot itself
- the action point or overlay box
- the step phase currently being replayed
3. Failure tags
Then look at benchmark output / manifest summaries and ask which bucket fits best:perception_failuregrounding_failureplanning_failureaction_selection_failureexecution_environment_failurestop_completion_failure
