Why reproducible runs matter in QitOS

QitOS is opinionated about one thing: a run is not finished just because a model returned text. A run is finished when it leaves behind enough structure for reproducibility. A useful research run must leave behind enough structure that another person can ask:

what model and parser were used?
what tool surface was exposed?
what benchmark split was this?
can I replay the trajectory?
can I diff this run against the previous one?

v0.3 adds an official-run contract, normalized benchmark result rows, and a stronger qita review path. Prompt work, parser (a component that converts raw model output into a typed Decision) work, tool work, and benchmark work become much easier to trust once every run is exportable, replayable, and comparable. If your team wants to move fast on agent research, reproducible runs are not a side feature. They are the memory of the project.

Documentation Index