- should you still build the tool surface by hand?
- how much workflow discipline belongs in the system prompt?
- when is
HistoryPolicyenough? - when do you need
CompactHistoryor explicit memory?
examples/real/claude_code_agent.py.
What changes from lesson 2
| Branch | Lesson 2 | Lesson 3 |
|---|---|---|
| Tools | Manual registry around a trimmed CodingToolSet | coding_tools(...) preset registry |
| Prompt | Planner + executor prompts | One workflow-heavy system prompt |
| State | Plan and cursor | Todos, mode, target file, verification command, optional doc URL |
| History | Default behavior | Explicit HistoryPolicy(max_messages=16, max_tokens=2800) |
| Memory | None | Still none by default, but now memory becomes a real design option |
| Compaction | Not introduced | Introduced as an upgrade path for longer runs |
The system prompt now defines workflow discipline
Unlike lesson 1, this prompt is not just a parser contract. It also encodes operating style:- the runtime stays the same
- the prompt can still become much more operational
The default lesson parser remains ReAct on purpose
Even though the prompt is richer, the parser is still:- better state
- better tools
- better workflow prompting
The example is now preset-first
Under the hood, the example still builds anOpenAICompatibleModel transport for the current endpoint.
But v0.4 adds one new layer before that transport is created:
- resolve a
FamilyPreset - build a
HarnessPolicy - choose protocol, parser, tool delivery mode, and context defaults
gpt-oss, and Gemma 4 without changing the agent implementation itself.
For most of those families, the harness is still text/JSON-first:
- the model returns text
- the tool schema is either injected into the prompt or passed via tool parameters
- the parser turns text into a
Decision
- easy to compare across providers
- easy to inspect in traces
- easy to adapt to local endpoints
Start from a preset tool registry
The lesson uses:This is the point in the course where presets become the right abstraction.
coding_tools(...) gives you a coherent workspace bundle instead of forcing you to hand-register every file, shell, task, and notebook tool.The lesson here is:- build tools by hand while learning the kernel
- switch to presets when the agent surface becomes operationally large
Understand what the preset is buying you
coding_tools(...) is the standard full coding bundle.In practice, that gives the agent access to:- file inspection and editing
- shell execution
- task/todo helpers
- optional notebook support
- optional web and documentation tools
Design state for long-running work
The state now carries workflow signals:Why this state shape works:
todosexposes a work queue that survives multiple stepsmodelets the agent remember whether it is planning or executingdoc_urladds optional external grounding without forcing browsingscratchpadkeeps the recent compressed trajectory
Use reduce to absorb structured tool output
reduce() listens for tool-driven workflow state:reduce() still decides what the agent should remember.Introduce explicit history control
The run passes:This is the first course lesson where message-window management matters.
HistoryPolicy answers:- how many recent messages are retained
- how many tokens can be spent on history
- when older interaction context stops being sent verbatim
Learn the boundary between history, compaction, and memory
In this lesson, the example still does not attach a custom Read that carefully:
history= or memory=.That is a meaningful choice:HistoryPolicycontrols the message budget- state stores immediate workflow artifacts like todos and mode
- no separate memory store is needed yet
CompactHistory when the run becomes long enough that simple trimming loses too much context:HistoryPolicytrims the message requestCompactHistorysummarizes and preserves old interaction historyMemorystores reusable records outside the immediate message stream
Understand when to upgrade the protocol
This lesson still uses text ReAct, and that is usually the right call.Consider a protocol upgrade only when you need something specific:
- JSON or XML if you need stricter structured output than text ReAct
- Terminus if the agent is driving a live terminal session
- a model-specific parser such as
MiniMaxToolCallParserif the provider emits native structured tool calls you actually want to preserve
Run it like an operator and inspect it like a researcher
Run:Inspect:In
qita, inspect:- whether todos appear early and remain coherent
- whether
modechanges match the intended workflow - how the prompt and parser still stay in the simple ReAct path
- whether history trimming changes the model’s behavior
- whether the run would benefit from
CompactHistory
The right mental model for long-running agents
By this point in the course, you should think in layers:- state is what the next step definitely needs
- history is what the next model call may need
- compaction is how old history is compressed
- memory is what should outlive the immediate turn structure
Full example
The full runnable lesson lives at:What lesson 4 adds
Lesson 4 keeps the long-running structure, but changes the domain completely. That means you will learn how to specialize:- tool composition
- prompt policy
- state semantics
reduce()logic
Next lesson: Code security audit agent
Turn the same kernel into a defensive review agent with ranked findings and audit-specific traces.
Related guide: observability
Review qita board, replay, and export before studying the final audit workflow.
