Skip to content

fix(kosong): clamp max_tokens to max_output_size for OpenAI-compatible providers#1208

Open
slgao wants to merge 1 commit into
MoonshotAI:mainfrom
slgao:fix/openai-max-output-size
Open

fix(kosong): clamp max_tokens to max_output_size for OpenAI-compatible providers#1208
slgao wants to merge 1 commit into
MoonshotAI:mainfrom
slgao:fix/openai-max-output-size

Conversation

@slgao

@slgao slgao commented Jun 29, 2026

Copy link
Copy Markdown

Related Issue

Resolves #1148

Problem

When using a third-party OpenAI-compatible provider (HuggingFace, Ollama, etc.) with a max_output_size lower than CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING (131072), the agent sends a max_tokens value that exceeds the provider's actual limit, causing a 400 error.

Root cause: two gaps in the OpenAI provider path:

  1. provider-manager.ts was not forwarding maxOutputSize as maxTokens to the OpenAI provider config (unlike the Anthropic case which already passes defaultMaxTokens).
  2. withMaxCompletionTokens in openai-legacy.ts always applies the generic 128k ceiling, overriding an explicitly configured maxTokens when the budget cap is large.

What changed

packages/kosong/src/providers/openai-legacy.ts

  • Added _explicitMaxTokens: boolean field, set when options.maxTokens is provided at construction.
  • In withMaxCompletionTokens, when _explicitMaxTokens is true, the configured value is used as the ceiling instead of CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING. This mirrors the existing _explicitMaxTokens pattern already used by the Anthropic provider.

packages/agent-core/src/session/provider-manager.ts

  • Added ...(maxOutputSize !== undefined ? { maxTokens: maxOutputSize } : {}) to the openai case in toKosongProviderConfig, mirroring the existing defaultMaxTokens spread in the anthropic case.

packages/kosong/test/openai-legacy.test.ts

  • Added a regression test that constructs a provider with maxTokens: 65536 (simulating a HuggingFace endpoint), calls withMaxCompletionTokens(1_048_576, ...), and asserts the resulting request body has max_tokens: 65536 instead of the old 131072.

packages/agent-core/test/agent/config-state.test.ts

  • Updated the existing 'clamps the LLM completion cap to 128k for openai-compatible providers' test, which was written to document the old buggy behaviour (expect(requestMaxTokens).toBe(131072) for maxOutputSize: 384000). Now asserts 384000 with an updated comment.

Checklist

  • I have read the CONTRIBUTING document.
  • I have linked a related issue, or explained the problem above.
  • I have added tests that prove my feature works.
  • Ran gen-changesets skill, or this PR needs no changeset.
  • Ran gen-docs skill, or this PR needs no doc update.

…e providers

Third-party OpenAI-compatible providers (HuggingFace, Ollama, etc.) can have
output limits below the generic CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING
(131072). When max_output_size is explicitly configured, withMaxCompletionTokens
now honours it as a hard upper bound instead of applying the generic ceiling,
preventing 400 errors from providers whose actual limit is lower.

Fixes MoonshotAI#1148.
@changeset-bot

changeset-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 52f9427

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@moonshot-ai/kosong Patch
@moonshot-ai/agent-core Patch
@moonshot-ai/kimi-code Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52f942740f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

baseUrl: providerValue(provider.baseUrl, provider.env, 'OPENAI_BASE_URL'),
apiKey: providerApiKey(provider),
reasoningKey,
...(maxOutputSize !== undefined ? { maxTokens: maxOutputSize } : {}),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor the max-token env opt-out

When KIMI_MODEL_MAX_COMPLETION_TOKENS=0 or KIMI_MODEL_MAX_TOKENS=0, resolveCompletionBudget returns undefined to disable completion-token clamping, but this line now bakes maxOutputSize into every OpenAI-compatible provider config. OpenAILegacyChatProvider.generate() serializes constructor maxTokens as max_tokens even when applyCompletionBudget is skipped, so any OpenAI-compatible model alias with maxOutputSize will still send a cap despite the documented env opt-out. Keep maxOutputSize in the budget path or avoid wiring it when the opt-out is active.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: max_tokens not clamped to max_output_size for OpenAI-compatible providers — 400 on models with output limit < context window (v0.20.1)

1 participant