QitOS separates two distinct concerns:Documentation Index
Fetch the complete documentation index at: https://qitor.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- History — the conversation context sent to the LLM each step. It is ephemeral, message-scoped, and controlled by
HistoryPolicy. - Memory — a semantic record store that accumulates observations across steps and can be queried by strategy (window, summary, vector, or file).
The mental model
When researchers say “context”, they often mean several different things at once. QitOS keeps them separate.| Concern | QitOS primitive | Typical use |
|---|---|---|
| What the model sees next step | History + HistoryPolicy | Prompt continuity |
| What the agent wants to remember across steps | Memory | Recall of findings, artifacts, and prior observations |
| What happens when context grows too large | TokenBudgetSummaryHistory / CompactHistory | Long-running control |
History
History stores HistoryMessage objects (role, content, step_id). The Engine appends a user message before each LLM call and an assistant message after. At call time, it uses HistoryPolicy to slice the window it passes to the model.
HistoryPolicy
HistoryPolicy lives in qitos.core.history and controls which messages the Engine selects:
agent.run():
HistoryPolicy.build_query() produces the retrieval query sent to the History adapter each step. The default Engine history implementation (_EngineWindowHistory) satisfies this query with a sliding window.
Built-in history strategies
WindowHistory
WindowHistory is the simplest option. It keeps a fixed recency window and evicts older messages.
Choose it when:
- your runs are short
- you want easy-to-predict context behavior
- you do not want automatic summarization
TokenBudgetSummaryHistory
TokenBudgetSummaryHistory watches a token budget and summarizes older messages when the current window would exceed that budget.
Choose it when:
- the agent regularly runs long enough to hit context pressure
- you want a simple summary fallback
- you want token-budget behavior without the full multi-stage compaction (context reduction by summarizing or trimming older messages) system
CompactHistory
CompactHistory is the most advanced built-in context compactor (a strategy that reduces context size by summarizing or trimming older messages).
It combines:
- warning thresholds before overflow
- grouping into compactable rounds
- micro-compaction for large old messages
- summary compaction for older interaction history
- runtime compaction metadata that can surface in traces
- you are building long-running agents
- you care about inspectable context management
- you want context compaction to be part of the research artifact rather than hidden middleware
Context compaction in practice
CompactHistory is worth understanding because it reflects a core QitOS design idea: long-running behavior should be explicit and debuggable.
Its main pieces are:
CompactConfig— thresholds and compaction policyMessageGrouper— how messages are grouped into rounds- micro-compaction — trims very large old messages while keeping a preview
- summary compaction — replaces older rounds with a compact continuation summary
Memory
Memory stores MemoryRecord objects (role, content, step_id, metadata). The Engine calls memory.append() after each observation and can retrieve records to inject into the prepare() context.
All four concrete adapters implement the same Memory ABC:
Attaching memory to your agent
Pass a memory instance to theAgentModule constructor:
__init__:
WindowMemory
Keeps the most recentwindow_size records. Older records are dropped by evict().
evict() trims _records to the last window_size entries and returns how many were removed. The Engine does not call evict() automatically — call it yourself in reduce() if you need to actively prune:
SummaryMemory
Retains the lastkeep_last records in memory. When evict() is called, older records are condensed into a one-line summary string stored in _summaries. Useful when you want the agent to have a compact reference to earlier work without consuming full context.
summarize() condenses the last max_items records into a pipe-separated string:
memory._summaries — these persist after eviction and can be injected into prepare():
VectorMemory
Retrieves records by semantic similarity using dot-product scoring. Each record is embedded onappend() and ranked on retrieve().
text is provided, retrieve() falls back to returning the most recent top_k records.
The default embedder (
_default_embedder) uses a character-bucket heuristic. It is fast and has no external dependencies, but its ranking quality is low. For meaningful semantic search, pass a real embedding function.MarkdownFileMemory
Persists every record to a local Markdown file as an append-only log. The file survives restarts, making it useful for long-running or multi-session agents.Choosing a memory adapter
| Adapter | In-memory | Persistent | Retrieval |
|---|---|---|---|
WindowMemory | Yes | No | Recency + role filter |
SummaryMemory | Yes | No | Recency; summarizes older records |
VectorMemory | Yes | No | Semantic similarity |
MarkdownFileMemory | Yes (bounded) | Yes | Recency + role filter |
WindowMemory for most agents. Switch to MarkdownFileMemory if you need persistence across restarts, or VectorMemory when you want the agent to recall semantically relevant past observations rather than just recent ones.
Recommended combinations
| Goal | History | Memory |
|---|---|---|
| Small pattern demos | WindowHistory | none or WindowMemory |
| Medium coding agents | HistoryPolicy + default window or TokenBudgetSummaryHistory | WindowMemory or SummaryMemory |
| Long-running coding agents | CompactHistory | MarkdownFileMemory or SummaryMemory |
| Search/research agents | TokenBudgetSummaryHistory or CompactHistory | VectorMemory |
Observe context behavior with qita
History and compaction are not meant to be invisible. In long-running agents, inspect your runs withqita and look for:
- context-history warning events
- summary or compaction events
- whether later steps lose important constraints
- whether your chosen memory adapter is preserving the right artifacts (persistent output records from the run)
