Skip to main content

Documentation Index

Fetch the complete documentation index at: https://qitor.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Benchmarks and Recipes

QitOS keeps three different layers separate by design.

1. Framework layer

This is the reusable kernel (the core AgentModule + Engine execution loop):
  • AgentModule + Engine
  • DesktopEnv
  • ActionSpace
  • EnvironmentAdapter
  • family presets
  • qita replay and visual inspection
Framework code should stay benchmark-agnostic.

2. Benchmark layer

Dataset-specific integration belongs here:
  • qitos.benchmark.desktop for the starter benchmark family
  • qitos.benchmark.osworld for the real OSWorld adapter path
  • benchmark-specific runtimes
  • benchmark-specific evaluators/scorers
  • benchmark-native task metadata and artifact (a persistent output file or data record from a run) handling
If something involves test_all.json, evaluator bridges, setup/postconfig, qcow2 boot inputs, or benchmark-native scoring, it belongs here.

3. Recipe layer

Recipes are reproducible baseline methods:
  • canonical single-agent baselines
  • benchmark baseline methods
  • multimodal starter methods
The desktop baseline now lives in:
  • /Users/morinop/coding/yoga_framework/qitos/recipes/desktop/osworld_starter.py
The public example:
  • /Users/morinop/coding/yoga_framework/examples/real/openai_cua_agent.py
is now only a thin entrypoint around the recipe. The same structure now also applies to:
  • qitos.recipes.benchmarks.gaia
  • qitos.recipes.benchmarks.tau_bench
  • qitos.recipes.benchmarks.cybench

Why this split matters

This split solves three real problems:
  • benchmark runners no longer depend on example files
  • one baseline can be reused by examples, docs, and benchmark runners
  • future qitos-recipes extraction becomes a packaging move instead of a redesign
This separation keeps QitOS viable as a research-first framework. If you are adding a new benchmark family, continue with Third-party benchmark integration.