Skip to content

feat: stream advisor tool output#25032

Open
ThomasK33 wants to merge 3 commits intomainfrom
codagt-322-advisor-streaming
Open

feat: stream advisor tool output#25032
ThomasK33 wants to merge 3 commits intomainfrom
codagt-322-advisor-streaming

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Stream advisor output into the advisor tool card while the nested advisor call is still running.

This keeps the advisor implementation intentionally advisor-specific: the parent model still receives the same final structured tool result, while the frontend receives transient tool-result.result_delta parts to render partial advisor text in the expanded card. The final persisted chat history remains unchanged.

Refs CODAGT-322.

Generated by Coder Agents.

Implementation plan
  • Publish advisor text deltas from the nested chatloop.Run via RunAdvisorOptions.OnAdviceDelta.
  • Forward those deltas through chatadvisor.Tool with the parent advisor tool call ID.
  • Emit transient ChatMessagePartTypeToolResult websocket parts with ResultDelta from chatd.
  • Add result_delta to the generated tool-result TypeScript variant.
  • Accumulate tool result deltas in frontend stream state and keep the tool running until the final result arrives.
  • Render streamed advisor advice in the existing advisor card using streaming markdown mode, while retaining the updated advisor UI.

@ThomasK33 ThomasK33 marked this pull request as ready for review May 7, 2026 12:03
Copy link
Copy Markdown
Member Author

/coder-agents-review

1 similar comment
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-structured cross-cutting streaming feature. The callback chain from nested chatloop through runner/tool/chatd to the websocket is clean, each layer adds exactly one piece of context, and the persistence boundary is correctly maintained (deltas are transient, final result is authoritative). Test coverage spans every layer: unit, integration, visual, and a negative persistence assertion. The generic result_delta / isStreaming infrastructure is reusable if other tools adopt streaming later.

2 P3, 2 P4, 5 Nit. The P3s are about non-happy-path streaming behavior: retry-induced duplicate deltas and missing test coverage for the error-after-partial-deltas path. Everything else is cosmetic or observational.

"'RunAdvisorOptions configures one advisor invocation.' is the kind of comment that would make me flag '// GetUser gets a user' on a hospital chart." (Leorio)


codersdk/chats.go:321-323

Nit [DEREM-6] The StripInternal comment says ResultDelta is produced by "processStepStream in chatloop," but this PR now produces ResultDelta via PublishAdviceDelta in chatd.go:7063. Before this PR the field was never populated so the reference was harmlessly wrong; now it will send someone looking in the wrong file.

Something like: (see processStepStream in chatloop for ArgsDelta, PublishAdviceDelta in chatd for ResultDelta). (Gon)

🤖

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/chatadvisor/runner.go Outdated
return nil
},
}
if runOpts.OnAdviceDelta != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [DEREM-1] On chatloop retry, the client accumulates duplicate advisor deltas with no reset signal.

RunAdvisor sets PublishMessagePart to forward text deltas via OnAdviceDelta, but never sets OnRetry on the chatloop options. When chatretry.Retry fires (rate limit, provider error), result is reset server-side (chatloop.go:465) and the step function re-runs, but no signal reaches the client. The second attempt's deltas append to the first attempt's accumulated text, producing garbled output (e.g. "Use small s...Use small steps.") in the advisor card during the retry window.

"The final tool result overwrites the accumulated text when it arrives, because mergeStreamPayload takes the value path when part.result is defined. So this is a transient visual glitch, not data corruption. But during the retry window (which includes a backoff delay), the user stares at garbled advisor text." (Hisoka)

Fix: wire OnRetry in RunAdvisor to either publish a reset event the client can use to clear accumulated delta text, or track and reset accumulated state inside the PublishMessagePart closure. (Hisoka)

🤖

});
});

it("accumulates tool result deltas until a final result arrives", () => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [DEREM-2] No test exercises the error-after-partial-streaming-deltas path.

If the advisor streams text deltas then the provider errors, the frontend receives result_delta events followed by a final tool-result with is_error: true. The backend path (partial OnAdviceDelta calls followed by chatloop.Run returning an error, which becomes a ResultTypeError JSON result) and the frontend transition (mergeStreamPayload overrides accumulated deltas, isStreaming clears) are both correct by design but neither has a test. A regression in value-override-delta precedence or isStreaming cleanup would be invisible.

"A regression in the value-override-delta precedence or the isStreaming cleanup would cause partial advice text to render alongside the error UI." (Chopper)

A sibling test that streams partial deltas then delivers an error result (both here and in runner_test.go) would close this gap. (Chopper)

🤖

);
const nextToolName = part.tool_name || existing?.name || "Tool";
const isFinalResult = part.result !== undefined;
const isStreaming = isFinalResult
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P4 [DEREM-3] isStreaming is only cleared when part.result is set; a bare is_error: true part without result would leave the tool stuck in "running" status.

"A tool-result part with is_error: true but no result and no result_delta, arriving after prior deltas set isStreaming: true, would leave the tool in 'running' status forever." (Mafuuu)

No current tool hits this (the advisor always sends a final result alongside errors), so this is about future-proofing. If a future streaming tool terminates with a bare error part, this will surface. Clearing isStreaming when part.is_error is true would close the gap. (Mafuuu)

🤖

Comment thread coderd/x/chatd/chatadvisor/runner.go Outdated
ctx context.Context,
question string,
conversationSnapshot []fantasy.Message,
opts ...RunAdvisorOptions,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P4 [DEREM-4] Variadic opts ...RunAdvisorOptions silently discards all but the first element.

Single internal caller today (tool.go), standard Go optional-parameter pattern, and the practical risk is near zero. Four reviewers flagged it independently; the concern is that the variadic signature implies composability that doesn't exist.

A *RunAdvisorOptions pointer would make the single-config contract explicit and cause a compile error on misuse: func (rt *Runtime) RunAdvisor(ctx, question, snapshot, opts *RunAdvisorOptions), nil check instead of length check. (Kite P3, Pariston Note, Ryosuke Note, Knov Note)

🤖

args: { question: sampleQuestion },
result: "Use the smaller diff while the advisor is still responding.",
},
play: async ({ canvasElement }) => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [DEREM-5] RunningWithStreamedAdvice asserts streamed text is visible and the empty-advice fallback is absent, but doesn't assert the initial placeholder ("Reviewing context and preparing guidance.") disappears.

The component uses a ternary so both can't render simultaneously today, but a refactor that splits the ternary into independent conditionals would pass the story while rendering both. Adding expect(canvas.queryByText("Reviewing context and preparing guidance.")).not.toBeInTheDocument() would guard the transition. (Bisky)

🤖

result?: unknown;
resultRaw?: string;
isError: boolean;
isStreaming?: boolean;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [DEREM-7] isStreaming on StreamToolResult is undocumented. Its semantics are non-obvious: the field distinguishes "accumulating deltas" from "final result received." Other boolean fields like isError are self-explanatory; this one requires knowing that result entries can exist in a pre-completion state.

Suggestion: /** True while result deltas are still accumulating before the final result. */ (Gon)

🤖

Comment thread coderd/x/chatd/chatadvisor/runner.go Outdated
"github.com/coder/coder/v2/codersdk"
)

// RunAdvisorOptions configures one advisor invocation.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [DEREM-8] RunAdvisorOptions configures one advisor invocation. doesn't mention streaming or callbacks, which is the struct's sole purpose.

Suggestion:

// RunAdvisorOptions carries optional streaming callbacks for a
// single RunAdvisor invocation.

(Leorio)

🤖

result: nextResult.value,
resultRaw: nextResult.rawText,
isError: nextIsError,
...(isStreaming ? { isStreaming: true } : {}),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [DEREM-9] The conditional spread ...(isStreaming ? { isStreaming: true } : {}) relies on the entire object being reconstructed from scratch each time so the old isStreaming: true doesn't survive. A future refactor that merges instead of replaces would silently reintroduce the flag. The test at streamState.test.ts:255 guards this with toBeUndefined(), so it's safe today.

Six reviewers independently noted this pattern. An explicit isStreaming: isStreaming || undefined would make the intent clearer. (Kite Nit, Meruem Note, Nami Note, Pariston Note, Knov Note, Ryosuke Note)

🤖

Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 9 R1 findings addressed in b8b160d. Clean fixes across every layer: retry reset mechanism (OnAdviceReset/ResultReset), pointer options, bare-error handling, and documentation. The panel (9 reviewers) verified each fix against the code. One new P3 surfaced.

1 P3. The PublishAdviceReset wiring at the tool layer lacks a test mirroring the existing PublishAdviceDelta test.

"If someone breaks the PublishAdviceReset wiring (removes the if guard, drops call.ID, swaps the callback), no test fails." (Bisky)

🤖 This review was automatically generated with Coder Agents.

Comment on lines +64 to +68
if opts.PublishAdviceReset != nil {
runOpts.OnAdviceReset = func() {
opts.PublishAdviceReset(call.ID)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [DEREM-10] No test for PublishAdviceReset wiring at the tool layer.

TestAdvisorToolPublishesAdviceDeltasWithToolCallID proves call.ID is threaded through the PublishAdviceDelta closure (lines 60-62). This structurally identical PublishAdviceReset closure has no corresponding test. If someone removes the call.ID binding here, the chatd.go guard (toolCallID == "") silently suppresses the reset event, and the frontend retains stale accumulated text on retry. The runner test (TestAdvisorRunResetsAdviceDeltasOnRetry) exercises OnAdviceReset directly but does not test the tool layer's call.ID binding.

Sketch: mirror TestAdvisorToolPublishesAdviceDeltasWithToolCallID in tool_test.go but use a StreamFn that returns a 429 on the first call and succeeds on the second. Wire both PublishAdviceDelta and PublishAdviceReset into the tool options. Assert the reset callback was called with "call-1" between the stale and fresh delta callbacks. (Bisky)

🤖

Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 10 findings from rounds 1-2 are fixed and verified. The panel (4 reviewers) and Netero found no new issues in the current head.

R1: 9 findings (2 P3, 2 P4, 5 Nit) on the initial streaming implementation. All addressed in b8b160d.
R2: 1 new P3 (missing reset wiring test). Addressed in 6e5fab8.
R3: Clean. No new findings.

The streaming pipeline is well-structured: nested chatloop deltas flow through runner/tool/chatd to the websocket, retry resets clear stale accumulated text, bare errors correctly terminate streaming state, and deltas never leak into persistence. Test coverage spans unit, tool-layer, integration, frontend state machine, and visual (story).

"The shrinking finding count (9, 1, 0) suggests the feedback loop is working." (Mafu-san)

🤖 This review was automatically generated with Coder Agents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant