qitos.kit provides concrete, reusable building blocks for common agent authoring patterns. All components attach to the AgentModule + Engine pipeline and do not introduce a second orchestrator.
How to read this page
Use this page as a capability map for QitOS authoring.| If you are deciding… | Start here |
|---|---|
| How the model should emit actions | Parsers |
| How to keep useful context in long runs | Memory and history/compaction |
| Which tools to expose | Tool sets |
| Which ready-made registries to start from | Preset builders |
| How to add planning or search structure | Planning |
| How to start screenshot-first multimodal work | ScreenshotEnv and the visual guide |
| How to build desktop / computer-use agents | DesktopEnv, ComputerUseToolSet, and the desktop guide |
Capability map
Parsers
ReActTextParserforThought:/Action:protocolsJsonDecisionParserfor JSON decision objectsXmlDecisionParserfor XML protocolsMiniMaxToolCallParserwhen the model returns function-call-like structuresTerminusJsonParserandTerminusXmlParserfor explicit termination-aware formats
Tool presets
coding_tools(...)for the canonical coding workspacecomputer_use_tools()for provider-neutral desktop / GUI action workflowsadvanced_coding_tools(...)for a Claude-style coding presetweb_tools()for web research and extractiontask_tools(...)for persistent task-board workflowssecurity_audit_tools(...)for defensive repository reviewthinking_tools()for explicit thought-recording flowsnotebook_tools(...),report_tools(...), andepub_tools(...)for narrower scenarios
Environments
ScreenshotEnvfor screenshot-first multimodal and GUI-adjacent workflowsDesktopEnvfor OSWorld-inspired desktop and computer-use loopsTextWebEnvfor text-browser-style web observationTmuxEnvfor interactive terminal workflows
ScreenshotEnv is the first built-in multimodal environment. Use it when you want to test screenshot-based reasoning, visual-web prompting, or the new qita visual-asset path without committing to a full benchmark adapter.
DesktopEnv builds on the same multimodal core but adds an OSWorld-style desktop lane: screenshot + accessibility + terminal observation, GUI controller ops, and container-first provider boundaries.
Long-running context blocks
WindowHistoryfor simple recency windowsTokenBudgetSummaryHistoryfor token-budget summarizationCompactHistoryfor multi-stage compaction with warning and summary eventsWindowMemory,SummaryMemory,VectorMemory, andMarkdownFileMemoryfor cross-step recall
Planning and search
NumberedPlanBuilderfor explicit plansPlanCursorfor plan execution bookkeepingDynamicTreeSearchfor branch selection and search-driven agents
- Parsers
- History & Compaction
- Memory
- Tool Sets
- Individual Tools
- Planning
- Critics
- Models
Parsers convert raw LLM output strings into typed
Decision objects. Choose the parser that matches the output format your prompt requests.Prompt format and parser must match exactly. If your prompt asks for
Thought: / Action: blocks, use ReActTextParser. If it asks for JSON, use JsonDecisionParser. If it asks for XML, use XmlDecisionParser.ReActTextParser
ReActTextParser
Parses ReAct-style text output with labeled blocks such as Constructor
UsagePass
Thought: and Action:.| Parameter | Default recognized keys | Description |
|---|---|---|
thought_keys | thought, thinking, think, rationale | Keys for the reasoning block |
reflection_keys | reflection, reflect, selfreflection | Keys for self-reflection blocks |
action_keys | action, tool, call | Keys for the action block |
final_keys | finalanswer, final, answer | Keys for the final answer block |
parser to AgentModule.run() or the Engine constructor.JsonDecisionParser
JsonDecisionParser
Parses JSON-formatted model output. Supports ConstructorSame parameter names as
mode field ("act", "final", "wait") and configurable key names.ReActTextParser. All keys default to the same values.UsageXmlDecisionParser
XmlDecisionParser
Parses XML-formatted model output. Supports both XML Constructor
mode attribute and configurable tag names.xml_*_tags take priority over *_keys for XML parsing. Defaults: xml_think_tags=("think", "thought", "thinking", "rationale"), xml_action_tags=("action", "tool", "call"), xml_final_tags=("final_answer", "final", "answer").UsageTerminusJsonParser / TerminusXmlParser
TerminusJsonParser / TerminusXmlParser
Terminus parsers are specialized variants that handle agent formats with explicit Both share the same constructor signature as
<terminus> or JSON-level termination signals. Use them with their matching system prompts (TERMINUS_JSON_SYSTEM_PROMPT, TERMINUS_XML_SYSTEM_PROMPT).JsonDecisionParser and XmlDecisionParser respectively.