Fix create_completion capping all batch prompts to the shortest context's max_tokens by Chessing234 · Pull Request #3867 · lm-sys/FastChat

Chessing234 · 2026-05-09T03:11:13Z

Bug

In create_completion, a pre-generation loop validates context length for each prompt:

for text in request.prompt:
    max_tokens, error_check_ret = await check_length(
        request, text, request.max_tokens, worker_addr
    )
    if error_check_ret is not None:
        return error_check_ret

    if isinstance(max_tokens, int) and max_tokens < request.max_tokens:
        request.max_tokens = max_tokens  # ← mutates shared state

check_length returns min(request.max_tokens, context_len - token_num) — the maximum tokens that can be generated given that prompt's context usage. The mutation request.max_tokens = max_tokens carries this reduced value into subsequent iterations.

Root cause

If prompt A is long (leaves 148 tokens available), request.max_tokens is reduced from 1000 to 148. Prompt B (short, leaves 1948 tokens available) is then checked against the already-reduced 148 and the generation loop uses 148 for both. Prompt B is silently limited to 148 tokens even though the user requested 1000 and the context easily fits them.

Why the fix is correct

Each prompt's available tokens are a function of its own length, not of other prompts in the batch. check_length already raises an error if a prompt overflows context, so the mutation's validation purpose is redundant. Without the mutation, generation uses request.max_tokens (the user's requested value) for all prompts; model workers naturally cap output at available context, producing correct results independently per prompt.

… one long prompt

Fix create_completion reducing request.max_tokens for all prompts via…

936415d

… one long prompt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix create_completion capping all batch prompts to the shortest context's max_tokens#3867

Fix create_completion capping all batch prompts to the shortest context's max_tokens#3867
Chessing234 wants to merge 1 commit into
lm-sys:mainfrom
Chessing234:fix/completion-max-tokens-shared-across-prompts

Chessing234 commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Chessing234 commented May 9, 2026

Bug

Root cause

Why the fix is correct

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant