- why certain design decisions were made
- what worked in real runs
- what failed in benchmarks and long-running agents
- how prompt, parser, tools, memory, and observability interact in practice
What we want to publish here
We expect the blog to grow around a few recurring themes:- Design notes: why the framework is structured the way it is
- Field reports: what we learned from long-running coding and audit agents
- Benchmark reports: GAIA, Tau-Bench, and CyBench observations
- Prompting and parsing: model harness choices and protocol tradeoffs
- Observability: how
qitaturns traces into research artifacts
Start here
Why QitOS keeps a single kernel
A short design note on the AgentModule + Engine split and why we avoid hidden second runtimes.
Why reproducible runs matter
A short field note on official runs, normalized benchmark outputs, and best-effort replay.
Tutorial track
If you want the hands-on learning path first, start with the four-lesson tutorial course.
