Benchmarks and Recipes
QitOS now keeps three different layers separate on purpose.1. Framework layer
This is the reusable kernel:AgentModule + EngineDesktopEnvActionSpaceEnvironmentAdapter- family presets
- qita replay and visual inspection
2. Benchmark layer
This is where dataset-specific integration belongs:qitos.benchmark.desktopfor the starter benchmark familyqitos.benchmark.osworldfor the real OSWorld adapter path- benchmark-specific runtimes
- benchmark-specific evaluators/scorers
- benchmark-native task metadata and artifact handling
test_all.json, evaluator bridges, setup/postconfig, qcow2 boot inputs, or benchmark-native scoring, it belongs here.
3. Recipe layer
Recipes are reproducible baseline methods:- canonical single-agent baselines
- benchmark baseline methods
- multimodal starter methods
/Users/morinop/coding/yoga_framework/qitos/recipes/desktop/osworld_starter.py
/Users/morinop/coding/yoga_framework/examples/real/openai_cua_agent.py
qitos.recipes.benchmarks.gaiaqitos.recipes.benchmarks.tau_benchqitos.recipes.benchmarks.cybench
Why this split matters
This split solves three real problems:- benchmark runners no longer depend on example files
- one baseline can be reused by examples, docs, and benchmark runners
- future
qitos-recipesextraction becomes a packaging move instead of a redesign
