Skip to main content
Tracing is the persistence layer behind QitOS observability. Every traced run writes a self-contained directory:
<trace_logdir>/<run_id>/
  manifest.json
  events.jsonl
  steps.jsonl

What each file means

FilePurpose
manifest.jsonRun summary, reproducibility metadata, benchmark metadata, and official-run fields
events.jsonlEvent stream across runtime phases
steps.jsonlOne structured record per completed step
The raw files are the source of truth. qita is the human inspection surface built on top of them.

Why tracing is a first-class feature

QitOS is built for agent research, not only one-off demos. That means the framework must preserve:
  • how a run stopped
  • what prompt/parser contract it used
  • what tool surface it saw
  • how context changed over time
  • which config fields matter for replay and comparison
This is why tracing is enabled by default in AgentModule.run(...).

Trace metadata in v0.3

The v0.3 closure adds stronger reproducibility metadata to the manifest, including:
  • git_sha
  • package_version
  • benchmark_name
  • benchmark_split
  • model_family
  • prompt_protocol
  • parser_name
  • tool_manifest
  • run_spec
  • experiment_spec
  • official_run
  • replay_mode
  • token / latency / cost summaries
These fields are what make qita compare and benchmark result normalization meaningful.

Best-effort replay

Tracing in QitOS supports best-effort research replay. That means QitOS records enough information to inspect and compare runs well, but it does not promise strict deterministic re-execution for remote models or external environments. Use traces for:
  • debugging long trajectories
  • comparing prompt/parser/tool changes
  • exporting artifacts for review
  • replaying benchmark failures
Do not assume a remote provider will always generate identical tokens forever.

qita on top of traces

Once traces exist, use:
qita board --logdir ./runs
qita replay --run ./runs/<run_id>
qita export --run ./runs/<run_id> --html ./report.html
qita also supports run comparison so you can ask why two runs diverged instead of reading raw JSON by hand.