Lesson 2: PlanAct

This lesson is the first time you bend the default loop. You are still building an ordinary QitOS agent, but now you will introduce:

a planning artifact (a persistent output produced by a run) in state
a planner prompt separate from the execution prompt
a decide() override that only handles the planning boundary

The key idea is that you still do not introduce a second runtime.

What changes from lesson 1

Branch	Lesson 1	Lesson 2
Control	Default LLM path every step	`decide()` intercepts only the planning boundary
Prompting	One ReAct system prompt	One planning prompt plus one execution prompt
State	Scratchpad + task fields	Add `plan_steps` and `cursor`
Parser	`ReActTextParser`	Still `ReActTextParser` for execution
Tools	Compact coding tools	Same compact coding tools
Memory and history	None beyond state	Still no separate memory or compaction

That last row matters. You are adding planning, not context complexity.

The two-prompt architecture

This lesson uses two prompt contracts.

Planner prompt

You are a planning module.
Break the task into 3-7 atomic executable steps.

Constraints:
- Each step must be actionable and verifiable.
- Prefer tool-executable operations over vague reasoning.
- No prose outside the numbered list.

In code, that is PLAN_DRAFT_PROMPT.

Executor prompt

You are the execution module for a Plan-Act agent.

You will receive the global task and one current plan step.
Execute only the current step. Do not jump ahead.

Output contract (strict):
Thought: <one sentence>
Action: <tool_name>(arg=value, ...)
or
Final Answer: <step result>

In code, that is PLAN_EXEC_SYSTEM_PROMPT. The design lesson is:

planning and acting can use different prompts
but they still flow through the same AgentModule + Engine runtime

The parser story in this lesson

The planner path does not use ReActTextParser. Instead:

_plan() renders PLAN_DRAFT_PROMPT
NumberedPlanBuilder calls the same LLM harness
the builder parses a numbered list into list[str]

The execution path does use ReActTextParser. Lesson 2 already teaches a subtle but important QitOS idea: different phases of the same agent can use different parsing contracts, as long as the control boundary is explicit.

The model harness stays intentionally boring

Just like lesson 1, the example uses:

OpenAICompatibleModel(...)

Why keep the same harness?

so you can isolate the effect of planning
so prompt and parser changes are easy to interpret
so the new lesson teaches one new idea instead of five

Extend state with a plan and a cursor

The state adds only what execution needs:

@dataclass
class PlanActState(StateSchema):
    plan_steps: list[str] = field(default_factory=list)
    cursor: int = 0
    target_file: str = "buggy_module.py"
    test_command: str = TEST_COMMAND
    scratchpad: list[str] = field(default_factory=list)

This is the first time the course makes a hidden reasoning artifact (a persistent output produced by a run) explicit.Why store the plan in state?

the trace can show it
prepare() can surface it
reduce() can advance it
your own logic can rewrite it later if needed

Use a dedicated plan builder

The planner is initialized once:

self.plan_builder = NumberedPlanBuilder()

And called like this:

prompt = render_prompt(
    PLAN_DRAFT_PROMPT,
    {
        "task": (
            f"{state.task}\n"
            f"Target file: {state.target_file}\n"
            f"Last step must run: {state.test_command}"
        ),
    },
)
plan = self.plan_builder.build(self.llm, prompt)

This is the right QitOS move:planning becomes a named artifact with a dedicated parser (a component that converts raw model output into a typed Decision), not an unstructured paragraph in the main scratchpad.

Use decide only as the planning gate

The control logic is small:

def decide(self, state: PlanActState, observation: dict[str, Any]):
    if not state.plan_steps or state.cursor >= len(state.plan_steps):
        if not self._plan(state):
            return Decision.final("Failed to build a valid plan.")
        return Decision.wait("plan_ready")
    return None

That return None is the whole point.Once a plan exists, the Engine goes back to its default LLM path:prompt -> ReActTextParser -> Decision -> tool executionLesson 2 is not about replacing the runtime. It is about adding one explicit control boundary to it.

Bind execution prompt and parser clearly

Execution still uses:

super().__init__(
    tool_registry=registry,
    llm=llm,
    model_parser=ReActTextParser(),
)

and:

def build_system_prompt(self, state: PlanActState) -> str | None:
    return render_prompt(
        PLAN_EXEC_SYSTEM_PROMPT,
        {
            "current_step": self._current_step_text(state),
            "tool_schema": self.tool_registry.get_tool_descriptions(),
        },
    )

So the planning phase and the execution phase are visibly different:

numbered plan builder for planning
ReAct text contract for execution

Make the plan visible in prepare

prepare() now renders both the global task and the current plan step:

def prepare(self, state: PlanActState) -> str:
    lines = [
        f"Task: {state.task}",
        f"Plan cursor: {state.cursor}/{len(state.plan_steps)}",
        f"Current plan step: {self._current_step_text(state)}",
        f"Step: {state.current_step}/{state.max_steps}",
    ]

This changes the agent’s working memory shape.Instead of re-reasoning over the entire task every step, the model reasons over:

one task
one explicit plan
one current plan item

Advance plan progress in reduce

Progress becomes ordinary state logic:

if isinstance(first, dict) and first.get("status") == "success":
    state.cursor += 1
if isinstance(first, dict) and int(first.get("returncode", 1)) == 0:
    state.final_result = "Verification passed."
    state.cursor = len(state.plan_steps)

The important lesson is not the exact condition. It is the placement:reduce() is where you decide what counts as plan completion.

Keep memory and history simple on purpose

Lesson 2 still does not add:

a memory adapter
HistoryPolicy tuning
CompactHistory

Why not?Because the plan itself already compresses the task into a better working form. Introducing context compaction here would blur whether behavior changed because of planning or because of context management.

Run it and inspect the planning boundary in qita

Run:

python examples/patterns/planact.py

Inspect:

qita board --logdir runs

In the trace, pay attention to:

the step where Decision.wait("plan_ready") appears
the moment plan_steps becomes part of state
the fact that later execution still uses the same ReAct parser path

Why PlanAct is still the same kernel

Researchers often think adding planning requires:

a separate planner service
a planner-executor loop outside the framework
a second agent runtime

This lesson is showing the opposite design:

a planner is just another controlled model call
a plan is just another state artifact
execution is still the normal Engine path

This is one of the deepest QitOS ideas.

Full example

The full runnable lesson lives at:

examples/patterns/planact.py

What lesson 3 adds

Lesson 3 keeps the same kernel again, but now the agent becomes operationally long-running. You will finally introduce:

preset (a reusable configuration bundle) toolsets instead of manual wiring
a workflow-oriented system prompt
explicit history control
the point where context compaction (summarizing older context to stay within token limits) and memory become real design questions

Next lesson: Claude Code-style agent

Move from pattern design to a long-running workspace agent with presets, history policy, and qita-driven debugging.

Related guide: memory and history

Review the distinction between state, history, compaction, and memory before the long-running lesson.

​What changes from lesson 1

​The two-prompt architecture

​Planner prompt

​Executor prompt

​The parser story in this lesson

​The model harness stays intentionally boring

​Why PlanAct is still the same kernel

​Full example

​What lesson 3 adds