Minimum contract
A run counts as an official QitOS run when its trace manifest includes:- a
RunSpec - an
ExperimentSpecfor benchmark work - a standard
manifest.json,events.jsonl, andsteps.jsonl - replay and export compatibility with
qita - a normalized benchmark result row when the run comes from
qit bench runor a benchmark example wrapper
Why this matters
Without that contract, two runs may both “finish”, but you still cannot answer the important questions:- were they using the same parser contract?
- were they using the same tool surface?
- was the benchmark split the same?
- can I replay the failure later?
- can I diff the run config instead of guessing?
Best-effort replay
QitOS currently provides research-grade best-effort replay, not strict byte-for-byte determinism. That means QitOS records enough information to make replay and comparison useful:seedgit_shapackage_versionprompt_protocolparser_nametool_manifest- environment summary
- step/event traces
- for debugging and inspection
- for prompt/parser/tool regressions
- for benchmark comparison
- for sharing runs with collaborators
Where you see this in practice
Openqita board and qita replay on a trace directory:
Canonical path
For benchmark work, the canonical path is:examples/benchmarks/ remain available, but they are now thin wrappers around the same official runner contract.
