Lesson 1: ReAct

This is the first complete agent in the course. It is small on purpose, but it is not a toy. You will build a real coding agent with:

a typed state
a real LLM harness (the wiring layer that connects a transport, parser, and protocol into a coherent model-facing configuration)
a real system prompt
a real parser (a component that converts raw model output into a typed Decision)
real tools
a real reduce() (a function that folds the current observation and decision into the next state) loop
real qita traces (structured logs of all run events and steps)

You will study examples/patterns/react.py, but the lesson is written so you do not need to reverse-engineer that file to understand why it works.

What you are building

The task is tiny:

open buggy_module.py
fix add(a, b) so it returns a + b
run a verification command

That small task is useful because it lets you see the whole QitOS kernel (the core AgentModule + Engine execution loop) without extra orchestration noise.

The design for this lesson

Design branch	Choice in this lesson	Why this is the right first choice
Task shape	One-file bug fix with one verification command	Easy to verify, easy to trace
State	`scratchpad`, `target_file`, `test_command`	Just enough to influence the next step
Model harness	`OpenAICompatibleModel` returning text	Simple and portable across providers
Prompt contract	`REACT_SYSTEM_PROMPT`	Explicit one-tool-per-turn text protocol
Parser	`ReActTextParser`	Direct match for `Thought:` / `Action:` output
Tools	Compact `CodingToolSet` inside a manual `ToolRegistry`	Learn tool design before presets
Memory	None	The run is too short to justify separate memory
History	Default Engine history behavior	Do not introduce context control before you need it
Traceability	`qita board`	Learn to inspect the kernel from day one

Why we start with the text ReAct harness

The first lesson uses:

OpenAICompatibleModel(...)

with:

model_parser=ReActTextParser()

That gives you the most transparent possible path: messages -> text model output -> ReAct parser -> Decision -> tool execution We do not start with native tool calling, XML, JSON, or model-specific harnesses (the wiring layer that connects a transport, parser, and protocol) because those add coupling before you understand the core loop.

The system prompt is a contract, not decoration

The lesson uses the canonical ReAct prompt:

You are a reliable ReAct agent.

Rules:
- Use at most one tool call per response.
- Never invent tool names or arguments.
- If a tool result is enough to conclude, output final answer directly.

Output contract (strict):
Thought: <one concise reasoning sentence>
Action: <tool_name>(arg=value, ...)
or
Final Answer: <final answer only>

In code, that is:

def build_system_prompt(self, state: ReactState) -> str | None:
    return render_prompt(
        REACT_SYSTEM_PROMPT,
        {"tool_schema": self.tool_registry.get_tool_descriptions()},
    )

This matters because ReActTextParser is not doing magic. It expects exactly this style of output. The first durable QitOS lesson is:

prompt format and parser choice are one design decision
if you change one, you usually need to change the other

The full model harness for this lesson

The example builds the model like this:

def build_model() -> OpenAICompatibleModel:
    return OpenAICompatibleModel(
        model=MODEL_NAME,
        api_key=api_key,
        base_url=MODEL_BASE_URL,
        temperature=0.2,
        max_tokens=2048,
    )

Why this harness is appropriate here:

it works with OpenAI-compatible endpoints
it keeps the response in plain text
it stays compatible with the prompt-injection tool schema path used by REACT_SYSTEM_PROMPT
it keeps the lesson portable across research labs and local gateways

You are not choosing the best model here. You are choosing the simplest harness that exposes the kernel clearly.

Design the state around the next step

The state is intentionally small:

@dataclass
class ReactState(StateSchema):
    scratchpad: list[str] = field(default_factory=list)
    target_file: str = "buggy_module.py"
    test_command: str = (
        'python -c "import buggy_module; assert buggy_module.add(20, 22) == 42"'
    )

Why these fields?

scratchpad stores the compressed trajectory (the sequence of observations and decisions across steps) that the next model step can use
target_file keeps the agent grounded in one file
test_command makes the success condition executable

This is your first QitOS habit:add only state that changes future decisions

Expose a minimal tool surface

The example uses a manual registry so you can see exactly what is being exposed:

registry = ToolRegistry()
registry.include(
    CodingToolSet(
        workspace_root=workspace_root,
        include_notebook=False,
        enable_lsp=False,
        enable_tasks=False,
        enable_web=False,
        expose_modern_names=False,
    )
)

This is important. CodingToolSet is a bundle, but you still control its surface.For lesson 1, the right tool surface is just enough to:

inspect files
edit files
run the verification command

Do not expose a richer toolset until the task requires it.

Bind the prompt to the parser

The agent constructor pairs the prompt contract and the parser:

super().__init__(
    tool_registry=registry,
    llm=llm,
    model_parser=ReActTextParser(),
)

Read that as one sentence:“This agent asks the model to speak ReAct text, and the Engine parses that text with the ReAct parser.”In QitOS, this pairing is the harness (the wiring layer that connects a transport, parser, and protocol).Later lessons will change prompts and protocols. For now, keep this pair fixed.

Prepare only the context the next step needs

prepare() curates the current step’s input:

def prepare(self, state: ReactState) -> str:
    lines = [
        f"Task: {state.task}",
        f"Target file: {state.target_file}",
        f"Verification command: {state.test_command}",
        f"Step: {state.current_step}/{state.max_steps}",
    ]
    if state.scratchpad:
        lines.append("Recent trajectory:")
        lines.extend(state.scratchpad[-8:])
    return "\n".join(lines)

This is the second core habit:prepare() is not a state dump — it is a prompt-ready view of state.

Use reduce to define what the agent remembers

ReAct learns inside reduce():

def reduce(
    self,
    state: ReactState,
    observation: dict[str, Any],
    decision: Decision[Action],
) -> ReactState:
    action_results = (
        observation.get("action_results", [])
        if isinstance(observation, dict)
        else []
    )
    if decision.rationale:
        state.scratchpad.append(f"Thought: {decision.rationale}")
    if decision.actions:
        state.scratchpad.append(f"Action: {format_action(decision.actions[0])}")
    if action_results:
        first = action_results[0]
        state.scratchpad.append(f"Observation: {first}")
        if isinstance(first, dict) and int(first.get("returncode", 1)) == 0:
            state.final_result = "Patch applied and verification passed."
    state.scratchpad = state.scratchpad[-30:]
    return state

Three lessons are in this one function:

not every observation (the environment’s response after an action or reset) belongs in future context
state is where you keep the compressed working memory
final_result is a clean, explicit success signal

Notice what we are not using yet

We do not use:

decide() overrides
explicit planning
memory adapters
custom history implementations
context compaction
model-specific protocol overrides

That is not because QitOS lacks them. Lesson 1 is about seeing the default path clearly before you bend it.

Run the example and inspect the kernel with qita

Run it:

python examples/patterns/react.py

Then inspect it:

qita board --logdir runs

In qita, check:

the exact prompt text sent to the model
whether the parser produced clean Thought and Action fields
whether the tool output made the verification condition obvious
whether final_result is set at the first true success condition

Why there is no separate memory or compaction yet

For this lesson, the right memory choice is “none.” Why:

the run is short
the useful context is already visible in scratchpad
adding retrieval or compaction here would blur the architecture before you understand it

In QitOS, memory is not a badge of sophistication. It is an answer to a concrete long-run problem.

Full example

The full runnable lesson lives at:

examples/patterns/react.py

What lesson 2 changes

Lesson 2 keeps the same model harness and the same execution parser, but introduces a new idea: planning should become explicit state and explicit control flow, not a longer hidden thought.

Next lesson: PlanAct

Add a planner, a cursor, and a decide() gate without changing the core runtime.

Related reference: kit

Review ReActTextParser, prompt templates, and coding tool surfaces used in this lesson.

​What you are building

​The design for this lesson

​Why we start with the text ReAct harness

​The system prompt is a contract, not decoration

​The full model harness for this lesson

​Why there is no separate memory or compaction yet

​Full example

​What lesson 2 changes