fix(kosong): clamp max_tokens to max_output_size for OpenAI-compatible providers by slgao · Pull Request #1208 · MoonshotAI/kimi-code

slgao · 2026-06-29T13:38:14Z

Related Issue

Resolves #1148

Problem

When using a third-party OpenAI-compatible provider (HuggingFace, Ollama, etc.) with a max_output_size lower than CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING (131072), the agent sends a max_tokens value that exceeds the provider's actual limit, causing a 400 error.

Root cause: two gaps in the OpenAI provider path:

provider-manager.ts was not forwarding maxOutputSize as maxTokens to the OpenAI provider config (unlike the Anthropic case which already passes defaultMaxTokens).
withMaxCompletionTokens in openai-legacy.ts always applies the generic 128k ceiling, overriding an explicitly configured maxTokens when the budget cap is large.

What changed

packages/kosong/src/providers/openai-legacy.ts

Added _explicitMaxTokens: boolean field, set when options.maxTokens is provided at construction.
In withMaxCompletionTokens, when _explicitMaxTokens is true, the configured value is used as the ceiling instead of CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING. This mirrors the existing _explicitMaxTokens pattern already used by the Anthropic provider.

packages/agent-core/src/session/provider-manager.ts

Added ...(maxOutputSize !== undefined ? { maxTokens: maxOutputSize } : {}) to the openai case in toKosongProviderConfig, mirroring the existing defaultMaxTokens spread in the anthropic case.

packages/kosong/test/openai-legacy.test.ts

Added a regression test that constructs a provider with maxTokens: 65536 (simulating a HuggingFace endpoint), calls withMaxCompletionTokens(1_048_576, ...), and asserts the resulting request body has max_tokens: 65536 instead of the old 131072.

packages/agent-core/test/agent/config-state.test.ts

Updated the existing 'clamps the LLM completion cap to 128k for openai-compatible providers' test, which was written to document the old buggy behaviour (expect(requestMaxTokens).toBe(131072) for maxOutputSize: 384000). Now asserts 384000 with an updated comment.

Checklist

I have read the CONTRIBUTING document.
I have linked a related issue, or explained the problem above.
I have added tests that prove my feature works.
Ran gen-changesets skill, or this PR needs no changeset.
Ran gen-docs skill, or this PR needs no doc update.

…e providers Third-party OpenAI-compatible providers (HuggingFace, Ollama, etc.) can have output limits below the generic CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING (131072). When max_output_size is explicitly configured, withMaxCompletionTokens now honours it as a hard upper bound instead of applying the generic ceiling, preventing 400 errors from providers whose actual limit is lower. Fixes MoonshotAI#1148.

changeset-bot · 2026-06-29T13:38:21Z

🦋 Changeset detected

Latest commit: 52f9427

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@moonshot-ai/kosong	Patch
@moonshot-ai/agent-core	Patch
@moonshot-ai/kimi-code	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52f942740f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-29T13:40:41Z

        baseUrl: providerValue(provider.baseUrl, provider.env, 'OPENAI_BASE_URL'),
        apiKey: providerApiKey(provider),
        reasoningKey,
+        ...(maxOutputSize !== undefined ? { maxTokens: maxOutputSize } : {}),


Honor the max-token env opt-out

When KIMI_MODEL_MAX_COMPLETION_TOKENS=0 or KIMI_MODEL_MAX_TOKENS=0, resolveCompletionBudget returns undefined to disable completion-token clamping, but this line now bakes maxOutputSize into every OpenAI-compatible provider config. OpenAILegacyChatProvider.generate() serializes constructor maxTokens as max_tokens even when applyCompletionBudget is skipped, so any OpenAI-compatible model alias with maxOutputSize will still send a cap despite the documented env opt-out. Keep maxOutputSize in the budget path or avoid wiring it when the opt-out is active.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(kosong): clamp max_tokens to max_output_size for OpenAI-compatible providers#1208

fix(kosong): clamp max_tokens to max_output_size for OpenAI-compatible providers#1208
slgao wants to merge 1 commit into
MoonshotAI:mainfrom
slgao:fix/openai-max-output-size

slgao commented Jun 29, 2026

Uh oh!

changeset-bot Bot commented Jun 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

slgao commented Jun 29, 2026

Related Issue

Problem

What changed

Checklist

Uh oh!

changeset-bot Bot commented Jun 29, 2026

🦋 Changeset detected

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant