chore: changes cost & latency optimization to post-process#198
Open
andrewklatzke wants to merge 4 commits into
Open
chore: changes cost & latency optimization to post-process#198andrewklatzke wants to merge 4 commits into
andrewklatzke wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit aa0a77f. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Requirements
Describe the solution you've provided
Moves the cost and latency optimization process to happen as a post-process pass rather than attempting to optimize for everything in each loop.
This helps reduce the amount of noise the LLM is dealing with in a single loop. Flow is now optimize for quality -> validate with additional samples -> optimize for meta (latency, cost).
Describe alternatives you've considered
The ultimate goal here is to move to distinct scorers/criteria that can be ranked. For now, this is a better solution than the all-in-one passes we were doing previously which could regress.
Note
Medium Risk
Changes when optimizations pass/fail, which model/parameters are committed, and callback timing—behavioral regressions are possible despite extensive test updates.
Overview
Cost and latency are no longer mixed into the main optimization loop. Phase 1 only chases judge/validation quality; duration and cost gates are removed from standard turns, validation, and ground-truth samples. When latency or token optimization is enabled and Phase 1 succeeds,
_run_cost_latency_phaseruns with instructions frozen, reuses the winner’s input/variables, evaluates each distinctmodel_choicesentry, applies latency/cost gates there, and picks the best passing candidate via normalized duration + cost vs baseline.Prompting and variation generation split by phase:
build_new_variation_promptno longer takes cost/latency flags; Phase 2 uses newbuild_token_latency_variation_prompt(content lock, model/param-only changes). LLM instruction edits in Phase 2 are reverted if they drift from the frozen winner. Judge prompts inject latency/cost guidance only while_in_cost_latency_phase.Run lifecycle and API surface:
on_passing_resultfires once with the true final context (Phase 2 winner or Phase 1 fallback);_handle_successcan suppress that callback during intermediate success. Every agent turn adds a_metascore entry for raw latency/cost telemetry.auto_commitnow persistsparameterson the created variation. Tests were updated so Phase 1 success no longer depends on duration gates.Reviewed by Cursor Bugbot for commit 4eb0bb0. Bugbot is set up for automated code reviews on this repo. Configure here.