diff --git a/.changeset/sampling-options-to-model-options.md b/.changeset/sampling-options-to-model-options.md new file mode 100644 index 000000000..72fa073d5 --- /dev/null +++ b/.changeset/sampling-options-to-model-options.md @@ -0,0 +1,31 @@ +--- +'@tanstack/ai': minor +'@tanstack/openai-base': minor +'@tanstack/ai-openai': minor +'@tanstack/ai-anthropic': minor +'@tanstack/ai-gemini': minor +'@tanstack/ai-grok': minor +'@tanstack/ai-groq': minor +'@tanstack/ai-ollama': minor +'@tanstack/ai-openrouter': minor +--- + +**BREAKING:** Sampling options (`temperature`, `topP`, `maxTokens`) have moved off the root of `chat()` / `ai()` / `generate()` and into provider-native `modelOptions`. There is no longer a generic root-level sampling surface — each provider accepts its own native keys, fully typed per model: + +- OpenAI (Responses): `modelOptions: { temperature, top_p, max_output_tokens }` +- Anthropic: `modelOptions: { temperature, top_p, max_tokens }` +- Gemini: `modelOptions: { temperature, topP, maxOutputTokens }` +- Grok: `modelOptions: { temperature, top_p, max_tokens }` +- Groq: `modelOptions: { temperature, top_p, max_completion_tokens }` +- Ollama: `modelOptions: { options: { temperature, top_p, num_predict } }` (nested) +- OpenRouter (chat): `modelOptions: { temperature, topP, maxCompletionTokens }` + +Middleware no longer sees `temperature`/`topP`/`maxTokens` as first-class fields on `ChatMiddlewareConfig`; mutate `config.modelOptions` (with the provider-native keys above) instead. `metadata` is unaffected and stays at the root. + +Migrate automatically with the codemod, which resolves the provider from the adapter and rewrites the keys for you: + +```bash +pnpm codemod:move-sampling-to-model-options "src/**/*.{ts,tsx}" +``` + +See the [Sampling Options migration guide](https://tanstack.com/ai/latest/docs/migration/sampling-options-to-model-options) for details. diff --git a/codemods/README.md b/codemods/README.md index 087be4867..c207f2ea4 100644 --- a/codemods/README.md +++ b/codemods/README.md @@ -6,9 +6,10 @@ Each codemod lives in its own subdirectory and is named after the migration it c ## Available codemods -| Codemod | Migrates | -| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| [`ag-ui-compliance`](./ag-ui-compliance) | Client-side renames introduced by the AG-UI client/server compliance release: `body` → `forwardedProps` on `useChat` / `ChatClient` / `updateOptions`, Svelte's `updateBody` → `updateForwardedProps`, and `chat({ conversationId })` → `chat({ threadId })`. | +| Codemod | Migrates | +| -------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [`ag-ui-compliance`](./ag-ui-compliance) | Client-side renames introduced by the AG-UI client/server compliance release: `body` → `forwardedProps` on `useChat` / `ChatClient` / `updateOptions`, Svelte's `updateBody` → `updateForwardedProps`, and `chat({ conversationId })` → `chat({ threadId })`. | +| [`move-sampling-to-model-options`](./move-sampling-to-model-options) | Moves root `temperature` / `topP` / `maxTokens` off `chat()` / `ai()` / `generate()` / `createChatOptions()` into provider-native `modelOptions`, renamed per provider (with ollama nesting under `options`). | ## Running a codemod diff --git a/codemods/move-sampling-to-model-options/README.md b/codemods/move-sampling-to-model-options/README.md new file mode 100644 index 000000000..a56a87eb6 --- /dev/null +++ b/codemods/move-sampling-to-model-options/README.md @@ -0,0 +1,83 @@ +# `move-sampling-to-model-options` + +Moves the root-level convenience sampling props — `temperature`, `topP`, and +`maxTokens` — off `chat()` / `ai()` / `generate()` / `createChatOptions()` +calls (imported from `@tanstack/ai`) and into the provider-native +`modelOptions` object, renaming each one to its provider's canonical option +name. + +This is a **breaking change**: the root-level props have been removed, so run +this codemod to migrate existing call sites onto the new `modelOptions` shape. + +## What it changes + +The provider is resolved from the `adapter:` property's factory call (e.g. +`openaiText('gpt-4o')` → `openai`). Each present root prop is moved into +`modelOptions` under its provider-specific name: + +| Root prop | openai | anthropic | gemini | grok | groq | openrouter | ollama (nested) | +| ------------- | ------------------- | ------------- | ----------------- | ------------- | ----------------------- | --------------------- | --------------------- | +| `temperature` | `temperature` | `temperature` | `temperature` | `temperature` | `temperature` | `temperature` | `options.temperature` | +| `topP` | `top_p` | `top_p` | `topP` | `top_p` | `top_p` | `topP` | `options.top_p` | +| `maxTokens` | `max_output_tokens` | `max_tokens` | `maxOutputTokens` | `max_tokens` | `max_completion_tokens` | `maxCompletionTokens` | `options.num_predict` | + +For **ollama**, the renamed keys are nested inside a `options` object **within** +`modelOptions` (e.g. `modelOptions: { options: { temperature, num_predict } }`). + +### Example (openai) + +```ts +// before +chat({ + adapter: openaiText('gpt-4o'), + messages, + temperature: 0.3, + maxTokens: 100, +}) + +// after +chat({ + adapter: openaiText('gpt-4o'), + messages, + modelOptions: { + temperature: 0.3, + max_output_tokens: 100, + }, +}) +``` + +If `modelOptions` already exists (as an object literal), the renamed keys are +merged into it. Original value expressions are preserved, and shorthand props +(`{ temperature }`) are expanded to `key: temperature`. + +## Running it + +From this repo: + +```bash +pnpm codemod:move-sampling-to-model-options "src/**/*.{ts,tsx}" +``` + +Or directly against the published transform — no clone needed: + +```bash +npx jscodeshift \ + --parser=tsx \ + -t https://raw.githubusercontent.com/TanStack/ai/main/codemods/move-sampling-to-model-options/transform.ts \ + src/**/*.{ts,tsx} +``` + +Add `--dry --print` to preview the rewrite without modifying files. + +## Report / skip behavior + +The codemod never partially transforms a single call. It leaves the call +untouched and emits an `api.report(...)` message in these cases: + +- **Unresolvable adapter** — no `adapter` prop, the adapter value isn't a + recognized provider-factory call (e.g. `makeAdapter()`), or it's + dynamic/spread. +- **`modelOptions` is not a plain object literal** — e.g. a spread or an + identifier reference. +- **Key conflict** — a target renamed key already exists in `modelOptions` + (or in `modelOptions.options` for ollama). Resolve these by hand. diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/anthropic-merge.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/anthropic-merge.input.ts new file mode 100644 index 000000000..28568a5d7 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/anthropic-merge.input.ts @@ -0,0 +1,12 @@ +import { chat } from '@tanstack/ai' +import { anthropicText } from '@tanstack/ai-anthropic' + +export function run(messages: Array) { + const temperature = 0.5 + return chat({ + adapter: anthropicText('claude-3-5-sonnet-latest'), + messages, + modelOptions: { top_k: 40 }, + temperature, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/anthropic-merge.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/anthropic-merge.output.ts new file mode 100644 index 000000000..9fe6fdc20 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/anthropic-merge.output.ts @@ -0,0 +1,15 @@ +import { chat } from '@tanstack/ai' +import { anthropicText } from '@tanstack/ai-anthropic' + +export function run(messages: Array) { + const temperature = 0.5 + return chat({ + adapter: anthropicText('claude-3-5-sonnet-latest'), + messages, + + modelOptions: { + top_k: 40, + temperature: temperature, + }, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/conflict.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/conflict.input.ts new file mode 100644 index 000000000..ad6771055 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/conflict.input.ts @@ -0,0 +1,14 @@ +// Conflict case: root `temperature` AND `modelOptions.temperature` are +// both present. The codemod must leave the whole call alone and report. + +import { chat } from '@tanstack/ai' +import { openaiText } from '@tanstack/ai-openai' + +export function run(messages: Array) { + return chat({ + adapter: openaiText('gpt-4o'), + messages, + modelOptions: { temperature: 0.9 }, + temperature: 0.3, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/conflict.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/conflict.output.ts new file mode 100644 index 000000000..ad6771055 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/conflict.output.ts @@ -0,0 +1,14 @@ +// Conflict case: root `temperature` AND `modelOptions.temperature` are +// both present. The codemod must leave the whole call alone and report. + +import { chat } from '@tanstack/ai' +import { openaiText } from '@tanstack/ai-openai' + +export function run(messages: Array) { + return chat({ + adapter: openaiText('gpt-4o'), + messages, + modelOptions: { temperature: 0.9 }, + temperature: 0.3, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/create-chat-options.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/create-chat-options.input.ts new file mode 100644 index 000000000..bc8a93c29 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/create-chat-options.input.ts @@ -0,0 +1,8 @@ +import { createChatOptions } from '@tanstack/ai' +import { openaiText } from '@tanstack/ai-openai' + +export const options = createChatOptions({ + adapter: openaiText('gpt-4o'), + temperature: 0.2, + topP: 0.8, +}) diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/create-chat-options.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/create-chat-options.output.ts new file mode 100644 index 000000000..b7b86cbff --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/create-chat-options.output.ts @@ -0,0 +1,11 @@ +import { createChatOptions } from '@tanstack/ai' +import { openaiText } from '@tanstack/ai-openai' + +export const options = createChatOptions({ + adapter: openaiText('gpt-4o'), + + modelOptions: { + temperature: 0.2, + top_p: 0.8, + }, +}) diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/gemini-rename.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/gemini-rename.input.ts new file mode 100644 index 000000000..63b6c3939 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/gemini-rename.input.ts @@ -0,0 +1,11 @@ +import { chat } from '@tanstack/ai' +import { geminiText } from '@tanstack/ai-gemini' + +export function run(messages: Array) { + return chat({ + adapter: geminiText('gemini-1.5-pro'), + messages, + topP: 0.9, + maxTokens: 512, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/gemini-rename.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/gemini-rename.output.ts new file mode 100644 index 000000000..8d53ee554 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/gemini-rename.output.ts @@ -0,0 +1,14 @@ +import { chat } from '@tanstack/ai' +import { geminiText } from '@tanstack/ai-gemini' + +export function run(messages: Array) { + return chat({ + adapter: geminiText('gemini-1.5-pro'), + messages, + + modelOptions: { + topP: 0.9, + maxOutputTokens: 512, + }, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/generate-and-ai.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/generate-and-ai.input.ts new file mode 100644 index 000000000..da8453ece --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/generate-and-ai.input.ts @@ -0,0 +1,18 @@ +import { ai, generate } from '@tanstack/ai' +import { anthropicText } from '@tanstack/ai-anthropic' + +export function viaAi(messages: Array) { + return ai({ + adapter: anthropicText('claude-3-5-sonnet-latest'), + messages, + maxTokens: 64, + }) +} + +export function viaGenerate(messages: Array) { + return generate({ + adapter: anthropicText('claude-3-5-sonnet-latest'), + messages, + topP: 0.95, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/generate-and-ai.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/generate-and-ai.output.ts new file mode 100644 index 000000000..156f635b0 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/generate-and-ai.output.ts @@ -0,0 +1,24 @@ +import { ai, generate } from '@tanstack/ai' +import { anthropicText } from '@tanstack/ai-anthropic' + +export function viaAi(messages: Array) { + return ai({ + adapter: anthropicText('claude-3-5-sonnet-latest'), + messages, + + modelOptions: { + max_tokens: 64, + }, + }) +} + +export function viaGenerate(messages: Array) { + return generate({ + adapter: anthropicText('claude-3-5-sonnet-latest'), + messages, + + modelOptions: { + top_p: 0.95, + }, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/groq-maxtokens.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/groq-maxtokens.input.ts new file mode 100644 index 000000000..2be6ebbad --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/groq-maxtokens.input.ts @@ -0,0 +1,10 @@ +import { chat } from '@tanstack/ai' +import { groqText } from '@tanstack/ai-groq' + +export function run(messages: Array) { + return chat({ + adapter: groqText('llama-3.1-70b'), + messages, + maxTokens: 256, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/groq-maxtokens.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/groq-maxtokens.output.ts new file mode 100644 index 000000000..386d29b6c --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/groq-maxtokens.output.ts @@ -0,0 +1,13 @@ +import { chat } from '@tanstack/ai' +import { groqText } from '@tanstack/ai-groq' + +export function run(messages: Array) { + return chat({ + adapter: groqText('llama-3.1-70b'), + messages, + + modelOptions: { + max_completion_tokens: 256, + }, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/no-import.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/no-import.input.ts new file mode 100644 index 000000000..f2d83aa0e --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/no-import.input.ts @@ -0,0 +1,13 @@ +// Negative case: no `@tanstack/ai` import. A local `chat` helper that +// happens to share the name and use `temperature`/`maxTokens` must be +// left completely untouched. + +function chat(opts: { temperature?: number; maxTokens?: number }) { + return opts +} + +export const result = chat({ + adapter: 'whatever', + temperature: 0.3, + maxTokens: 100, +}) diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/no-import.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/no-import.output.ts new file mode 100644 index 000000000..f2d83aa0e --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/no-import.output.ts @@ -0,0 +1,13 @@ +// Negative case: no `@tanstack/ai` import. A local `chat` helper that +// happens to share the name and use `temperature`/`maxTokens` must be +// left completely untouched. + +function chat(opts: { temperature?: number; maxTokens?: number }) { + return opts +} + +export const result = chat({ + adapter: 'whatever', + temperature: 0.3, + maxTokens: 100, +}) diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/ollama-nested.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/ollama-nested.input.ts new file mode 100644 index 000000000..f8c34e363 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/ollama-nested.input.ts @@ -0,0 +1,11 @@ +import { chat } from '@tanstack/ai' +import { ollamaText } from '@tanstack/ai-ollama' + +export function run(messages: Array) { + return chat({ + adapter: ollamaText('llama3'), + messages, + temperature: 0.7, + maxTokens: 200, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/ollama-nested.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/ollama-nested.output.ts new file mode 100644 index 000000000..d6d4a0bbb --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/ollama-nested.output.ts @@ -0,0 +1,16 @@ +import { chat } from '@tanstack/ai' +import { ollamaText } from '@tanstack/ai-ollama' + +export function run(messages: Array) { + return chat({ + adapter: ollamaText('llama3'), + messages, + + modelOptions: { + options: { + temperature: 0.7, + num_predict: 200, + }, + }, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/openai-basic.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/openai-basic.input.ts new file mode 100644 index 000000000..960cb4cd9 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/openai-basic.input.ts @@ -0,0 +1,11 @@ +import { chat } from '@tanstack/ai' +import { openaiText } from '@tanstack/ai-openai' + +export function run(messages: Array) { + return chat({ + adapter: openaiText('gpt-4o'), + messages, + temperature: 0.3, + maxTokens: 100, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/openai-basic.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/openai-basic.output.ts new file mode 100644 index 000000000..55ed5fcaa --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/openai-basic.output.ts @@ -0,0 +1,14 @@ +import { chat } from '@tanstack/ai' +import { openaiText } from '@tanstack/ai-openai' + +export function run(messages: Array) { + return chat({ + adapter: openaiText('gpt-4o'), + messages, + + modelOptions: { + temperature: 0.3, + max_output_tokens: 100, + }, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/openrouter-maxtokens.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/openrouter-maxtokens.input.ts new file mode 100644 index 000000000..49ee42d53 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/openrouter-maxtokens.input.ts @@ -0,0 +1,10 @@ +import { chat } from '@tanstack/ai' +import { openRouterText } from '@tanstack/ai-openrouter' + +export function run(messages: Array) { + return chat({ + adapter: openRouterText('anthropic/claude-3.5-sonnet'), + messages, + maxTokens: 1024, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/openrouter-maxtokens.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/openrouter-maxtokens.output.ts new file mode 100644 index 000000000..48280dfec --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/openrouter-maxtokens.output.ts @@ -0,0 +1,13 @@ +import { chat } from '@tanstack/ai' +import { openRouterText } from '@tanstack/ai-openrouter' + +export function run(messages: Array) { + return chat({ + adapter: openRouterText('anthropic/claude-3.5-sonnet'), + messages, + + modelOptions: { + maxCompletionTokens: 1024, + }, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/shorthand.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/shorthand.input.ts new file mode 100644 index 000000000..bf84a42a5 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/shorthand.input.ts @@ -0,0 +1,15 @@ +// Shorthand-key edge case: `temperature` is a local variable passed via +// shorthand. After the move, modelOptions must still reference the +// `temperature` identifier, not a literal. + +import { chat } from '@tanstack/ai' +import { openaiText } from '@tanstack/ai-openai' + +export function run(messages: Array) { + const temperature = 0.5 + return chat({ + adapter: openaiText('gpt-4o'), + messages, + temperature, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/shorthand.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/shorthand.output.ts new file mode 100644 index 000000000..b76f1aaa5 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/shorthand.output.ts @@ -0,0 +1,18 @@ +// Shorthand-key edge case: `temperature` is a local variable passed via +// shorthand. After the move, modelOptions must still reference the +// `temperature` identifier, not a literal. + +import { chat } from '@tanstack/ai' +import { openaiText } from '@tanstack/ai-openai' + +export function run(messages: Array) { + const temperature = 0.5 + return chat({ + adapter: openaiText('gpt-4o'), + messages, + + modelOptions: { + temperature: temperature, + }, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/unresolvable-adapter.input.ts b/codemods/move-sampling-to-model-options/__testfixtures__/unresolvable-adapter.input.ts new file mode 100644 index 000000000..5114f26d0 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/unresolvable-adapter.input.ts @@ -0,0 +1,13 @@ +// Unresolvable adapter: `makeAdapter()` is not a known provider factory. +// The codemod must leave the call alone and report. + +import { chat } from '@tanstack/ai' +import { makeAdapter } from './my-adapter' + +export function run(messages: Array) { + return chat({ + adapter: makeAdapter(), + messages, + temperature: 0.3, + }) +} diff --git a/codemods/move-sampling-to-model-options/__testfixtures__/unresolvable-adapter.output.ts b/codemods/move-sampling-to-model-options/__testfixtures__/unresolvable-adapter.output.ts new file mode 100644 index 000000000..5114f26d0 --- /dev/null +++ b/codemods/move-sampling-to-model-options/__testfixtures__/unresolvable-adapter.output.ts @@ -0,0 +1,13 @@ +// Unresolvable adapter: `makeAdapter()` is not a known provider factory. +// The codemod must leave the call alone and report. + +import { chat } from '@tanstack/ai' +import { makeAdapter } from './my-adapter' + +export function run(messages: Array) { + return chat({ + adapter: makeAdapter(), + messages, + temperature: 0.3, + }) +} diff --git a/codemods/move-sampling-to-model-options/transform.test.ts b/codemods/move-sampling-to-model-options/transform.test.ts new file mode 100644 index 000000000..b16c44f6e --- /dev/null +++ b/codemods/move-sampling-to-model-options/transform.test.ts @@ -0,0 +1,125 @@ +import { readFileSync } from 'node:fs' +import { dirname, resolve } from 'node:path' +import { fileURLToPath } from 'node:url' +import jscodeshift from 'jscodeshift' +import { describe, expect, it } from 'vitest' +import transform from './transform' + +const __filename = fileURLToPath(import.meta.url) +const __dirname = dirname(__filename) +const FIXTURES = resolve(__dirname, '__testfixtures__') + +function read(name: string): string { + return readFileSync(resolve(FIXTURES, name), 'utf-8') +} + +function runTransform( + fixtureBaseName: string, + ext: 'ts' | 'tsx', +): { output: string; reports: Array } { + const source = read(`${fixtureBaseName}.input.${ext}`) + const reports: Array = [] + const j = jscodeshift.withParser('tsx') + const result = transform( + { path: `${fixtureBaseName}.input.${ext}`, source }, + { + jscodeshift: j, + j, + stats: () => {}, + report: (msg: string) => { + reports.push(msg) + }, + }, + {}, + ) + if (typeof result !== 'string') { + throw new Error( + `transform returned ${typeof result} for ${fixtureBaseName}.input.${ext}; expected a string`, + ) + } + return { output: result, reports } +} + +// Normalize line endings — fixtures may be saved as CRLF on Windows +// while jscodeshift emits LF, which would make a string compare +// fail despite identical content. +function normalize(s: string): string { + return s.replace(/\r\n/g, '\n').trim() +} + +function expectFixture( + name: string, + ext: 'ts' | 'tsx' = 'ts', +): { reports: Array } { + const expected = read(`${name}.output.${ext}`) + const { output, reports } = runTransform(name, ext) + expect(normalize(output)).toBe(normalize(expected)) + return { reports } +} + +describe('move-sampling-to-model-options codemod', () => { + it('moves openai temperature/maxTokens into modelOptions (renamed)', () => { + const { reports } = expectFixture('openai-basic') + expect(reports).toEqual([]) + }) + + it('renames gemini topP/maxTokens to topP/maxOutputTokens', () => { + expectFixture('gemini-rename') + }) + + it('renames groq maxTokens to max_completion_tokens', () => { + expectFixture('groq-maxtokens') + }) + + it('renames openrouter maxTokens to maxCompletionTokens', () => { + expectFixture('openrouter-maxtokens') + }) + + it('nests ollama sampling options inside modelOptions.options', () => { + expectFixture('ollama-nested') + }) + + it('merges into an existing modelOptions object literal', () => { + expectFixture('anthropic-merge') + }) + + it('expands a shorthand sampling prop to `key: identifier`', () => { + expectFixture('shorthand') + }) + + it('transforms createChatOptions() calls', () => { + expectFixture('create-chat-options') + }) + + it('transforms ai() and generate() callee variants', () => { + expectFixture('generate-and-ai') + }) + + it('leaves files without a @tanstack/ai helper import untouched', () => { + const { reports } = expectFixture('no-import') + expect(reports).toEqual([]) + }) + + it('leaves a call alone and reports when a target key already exists in modelOptions', () => { + const { reports } = expectFixture('conflict') + expect(reports.length).toBeGreaterThan(0) + expect( + reports.some( + (r) => + r.includes('a target key already exists') && r.includes('left alone'), + ), + ).toBe(true) + }) + + it('leaves a call alone and reports when the adapter is unresolvable', () => { + const { reports } = expectFixture('unresolvable-adapter') + expect(reports.length).toBeGreaterThan(0) + expect( + reports.some( + (r) => + r.includes('could not resolve a known provider adapter') && + r.includes('left alone'), + ), + ).toBe(true) + }) +}) diff --git a/codemods/move-sampling-to-model-options/transform.ts b/codemods/move-sampling-to-model-options/transform.ts new file mode 100644 index 000000000..0d87992ce --- /dev/null +++ b/codemods/move-sampling-to-model-options/transform.ts @@ -0,0 +1,374 @@ +/** + * jscodeshift transform: move root sampling options into provider-native + * `modelOptions`. + * + * The root-level `temperature` / `topP` / `maxTokens` convenience props on + * `chat()` / `ai()` / `generate()` / `createChatOptions()` are being moved + * into the provider-native `modelOptions` object. Each provider names these + * options differently, so the rename is provider-specific and resolved from + * the `adapter:` property's factory call (e.g. `openaiText('gpt-4o')`). + * + * Per-provider rename of the three root keys: + * + * openai: temperature → temperature, topP → top_p, maxTokens → max_output_tokens + * anthropic: temperature → temperature, topP → top_p, maxTokens → max_tokens + * gemini: temperature → temperature, topP → topP, maxTokens → maxOutputTokens + * grok: temperature → temperature, topP → top_p, maxTokens → max_tokens + * groq: temperature → temperature, topP → top_p, maxTokens → max_completion_tokens + * openrouter: temperature → temperature, topP → topP, maxTokens → maxCompletionTokens + * ollama: NESTED inside a `options` object — + * temperature → options.temperature, topP → options.top_p, maxTokens → options.num_predict + * + * This is a breaking change: the root props have been removed, so this + * codemod migrates existing call sites onto the new `modelOptions` shape. + * + * Skip + report (via `api.report`) behavior, never partially transforming a + * single call: + * - the adapter can't be resolved to a known provider factory (missing + * `adapter`, dynamic/spread adapter, or unknown callee); + * - `modelOptions` exists but isn't a plain object literal; + * - a target renamed key already exists in `modelOptions` (or in + * `modelOptions.options` for ollama). + */ + +import type { + API, + ASTPath, + CallExpression, + Collection, + FileInfo, + Identifier, + ImportDeclaration, + JSCodeshift, + ObjectExpression, + Property, +} from 'jscodeshift' + +const CORE_PACKAGE = '@tanstack/ai' + +/** Helper names (imported from `@tanstack/ai`) we operate on. */ +const TARGET_CALLEES = new Set(['chat', 'ai', 'generate', 'createChatOptions']) + +/** The three root sampling props we relocate. */ +const ROOT_SAMPLING_KEYS = ['temperature', 'topP', 'maxTokens'] as const +type RootSamplingKey = (typeof ROOT_SAMPLING_KEYS)[number] + +type Provider = + | 'openai' + | 'anthropic' + | 'gemini' + | 'grok' + | 'groq' + | 'openrouter' + | 'ollama' + +/** + * Map a provider text-adapter factory name → provider. Factory names were + * confirmed against each provider package's `index.ts` exports. Note the + * OpenRouter factory is `openRouterText` (capital R); `openrouterText` is + * accepted defensively but is not a real export. + */ +const FACTORY_TO_PROVIDER: Record = { + openaiText: 'openai', + anthropicText: 'anthropic', + geminiText: 'gemini', + grokText: 'grok', + groqText: 'groq', + openRouterText: 'openrouter', + openrouterText: 'openrouter', + ollamaText: 'ollama', +} + +/** + * Per-provider rename of each root key → its `modelOptions` key name. For + * ollama the renamed keys live inside a nested `options` object (handled + * separately), so the names here are the keys *within* that nested object. + */ +const RENAME: Record> = { + openai: { + temperature: 'temperature', + topP: 'top_p', + maxTokens: 'max_output_tokens', + }, + anthropic: { + temperature: 'temperature', + topP: 'top_p', + maxTokens: 'max_tokens', + }, + gemini: { + temperature: 'temperature', + topP: 'topP', + maxTokens: 'maxOutputTokens', + }, + grok: { + temperature: 'temperature', + topP: 'top_p', + maxTokens: 'max_tokens', + }, + groq: { + temperature: 'temperature', + topP: 'top_p', + maxTokens: 'max_completion_tokens', + }, + openrouter: { + temperature: 'temperature', + topP: 'topP', + maxTokens: 'maxCompletionTokens', + }, + ollama: { + temperature: 'temperature', + topP: 'top_p', + maxTokens: 'num_predict', + }, +} + +/** ollama nests the renamed keys inside a `modelOptions.options` object. */ +const PROVIDERS_WITH_NESTED_OPTIONS = new Set(['ollama']) + +interface ImportFacts { + /** Whether any helper we transform is imported from `@tanstack/ai`. */ + hasCoreHelper: boolean +} + +function collectImportFacts(j: JSCodeshift, root: Collection): ImportFacts { + const facts: ImportFacts = { hasCoreHelper: false } + + root.find(j.ImportDeclaration).forEach((path: ASTPath) => { + const source = path.node.source.value + if (source !== CORE_PACKAGE) return + const specifiers = path.node.specifiers ?? [] + for (const spec of specifiers) { + if ( + spec.type === 'ImportSpecifier' && + TARGET_CALLEES.has(spec.imported.name) + ) { + facts.hasCoreHelper = true + } + } + }) + + return facts +} + +/** + * Find a non-computed, non-spread Property by its Identifier key name. + * Handles both `Property` (ESTree) and `ObjectProperty` (Babel) node types. + */ +function findKey(obj: ObjectExpression, name: string): Property | undefined { + for (const prop of obj.properties) { + if (prop.type !== 'Property' && prop.type !== 'ObjectProperty') continue + if ((prop as Property).computed) continue + const key = (prop as Property).key + if (key.type === 'Identifier' && key.name === name) { + return prop as Property + } + } + return undefined +} + +/** + * Extract the value expression of a property, expanding shorthand. For a + * shorthand `{ temperature }`, the value is the `temperature` Identifier, + * which is exactly the reference we want to preserve when relocating. + */ +function valueOf(prop: Property): Property['value'] { + return prop.value +} + +/** The first object-literal argument of a call, or undefined. */ +function firstObjectArg(node: CallExpression): ObjectExpression | undefined { + return node.arguments.find( + (a): a is ObjectExpression => a.type === 'ObjectExpression', + ) +} + +/** + * Resolve the provider from a call's first object argument by inspecting its + * `adapter` property. Returns the provider, or `null` if it can't be + * resolved (no adapter, non-call adapter, spread, or unknown factory). + */ +function resolveProvider(obj: ObjectExpression): Provider | null { + const adapterProp = findKey(obj, 'adapter') + if (!adapterProp) return null + const value = adapterProp.value + if (!value || value.type !== 'CallExpression') return null + const callee = (value as CallExpression).callee + if (callee.type !== 'Identifier') return null + const provider = FACTORY_TO_PROVIDER[(callee as Identifier).name] + return provider ?? null +} + +interface Conflict { + filePath: string + line?: number + reason: string +} + +export default function transform( + file: FileInfo, + api: API, +): string | null | undefined { + const j = api.jscodeshift + const root = j(file.source) + const facts = collectImportFacts(j, root) + + // Bail out early if none of the target helpers are imported from + // `@tanstack/ai` — keeps the codemod a no-op on unrelated files that + // happen to use a `temperature`/`topP`/`maxTokens` key. + if (!facts.hasCoreHelper) { + return file.source + } + + let changed = 0 + const conflicts: Array = [] + + root + .find(j.CallExpression) + .filter((path) => { + const callee = path.node.callee + return callee.type === 'Identifier' && TARGET_CALLEES.has(callee.name) + }) + .forEach((path: ASTPath) => { + const obj = firstObjectArg(path.node) + if (!obj) return + + // Only act if at least one root sampling prop is present. + const presentKeys = ROOT_SAMPLING_KEYS.filter((k) => findKey(obj, k)) + if (presentKeys.length === 0) return + + const line = path.node.loc?.start.line + const calleeName = + path.node.callee.type === 'Identifier' ? path.node.callee.name : 'call' + + // Resolve the provider from the adapter. Skip + report if we can't. + const provider = resolveProvider(obj) + if (!provider) { + conflicts.push({ + filePath: file.path, + line, + reason: `${calleeName}(): could not resolve a known provider adapter from the \`adapter\` property; left alone.`, + }) + return + } + + const renameMap = RENAME[provider] + const nested = PROVIDERS_WITH_NESTED_OPTIONS.has(provider) + + // Validate / locate modelOptions before mutating anything, so a + // conflict aborts the WHOLE call without a partial transform. + let modelOptionsObj: ObjectExpression | null = null + const modelOptionsProp = findKey(obj, 'modelOptions') + if (modelOptionsProp) { + if (modelOptionsProp.value.type !== 'ObjectExpression') { + conflicts.push({ + filePath: file.path, + line, + reason: `${calleeName}(): \`modelOptions\` exists but is not a plain object literal; left alone. Merge by hand.`, + }) + return + } + modelOptionsObj = modelOptionsProp.value as ObjectExpression + } + + // For ollama, locate/validate the nested `options` object too. + let nestedOptionsObj: ObjectExpression | null = null + if (nested && modelOptionsObj) { + const optionsProp = findKey(modelOptionsObj, 'options') + if (optionsProp) { + if (optionsProp.value.type !== 'ObjectExpression') { + conflicts.push({ + filePath: file.path, + line, + reason: `${calleeName}(): \`modelOptions.options\` exists but is not a plain object literal; left alone. Merge by hand.`, + }) + return + } + nestedOptionsObj = optionsProp.value as ObjectExpression + } + } + + // Conflict check: would any renamed target key collide with an + // existing key in the destination object? + const destForCheck = nested ? nestedOptionsObj : modelOptionsObj + let hasConflict = false + for (const key of presentKeys) { + const renamed = renameMap[key] + if (destForCheck && findKey(destForCheck, renamed)) { + hasConflict = true + break + } + } + if (hasConflict) { + conflicts.push({ + filePath: file.path, + line, + reason: `${calleeName}(): a target key already exists in ${ + nested ? '`modelOptions.options`' : '`modelOptions`' + }; left alone. Merge by hand.`, + }) + return + } + + // ---- All validations passed; perform the move. ---- + + // Build the renamed properties, preserving original value expressions. + const movedProps: Array = presentKeys.map((key) => { + const original = findKey(obj, key)! + const renamed = renameMap[key] + return j.property('init', j.identifier(renamed), valueOf(original)) + }) + + // Ensure the destination object exists. + if (!modelOptionsObj) { + modelOptionsObj = j.objectExpression([]) + obj.properties.push( + j.property('init', j.identifier('modelOptions'), modelOptionsObj), + ) + } + + if (nested) { + if (!nestedOptionsObj) { + nestedOptionsObj = j.objectExpression([]) + modelOptionsObj.properties.push( + j.property('init', j.identifier('options'), nestedOptionsObj), + ) + } + nestedOptionsObj.properties.push(...movedProps) + } else { + modelOptionsObj.properties.push(...movedProps) + } + + // Remove the moved root props from the call's first arg. + const movedSet = new Set(presentKeys) + obj.properties = obj.properties.filter((prop) => { + if (prop.type !== 'Property' && prop.type !== 'ObjectProperty') { + return true + } + if ((prop as Property).computed) return true + const key = (prop as Property).key + if ( + key.type === 'Identifier' && + movedSet.has(key.name as RootSamplingKey) + ) { + return false + } + return true + }) + + changed++ + }) + + for (const conflict of conflicts) { + const where = + conflict.line !== undefined + ? `${conflict.filePath}:${conflict.line}` + : conflict.filePath + api.report(`[move-sampling-to-model-options] ${where} — ${conflict.reason}`) + } + + return changed > 0 ? root.toSource() : file.source +} + +// jscodeshift inspects `.parser` on the default export to choose its +// AST flavor. We support both .ts and .tsx out of the box. +;(transform as unknown as { parser: string }).parser = 'tsx' diff --git a/codemods/package.json b/codemods/package.json index 5bee64e8f..f9d2803b8 100644 --- a/codemods/package.json +++ b/codemods/package.json @@ -7,7 +7,8 @@ "scripts": { "test": "vitest run", "test:dev": "vitest", - "ag-ui-compliance": "node ./run.mjs ag-ui-compliance" + "ag-ui-compliance": "node ./run.mjs ag-ui-compliance", + "move-sampling-to-model-options": "node ./run.mjs move-sampling-to-model-options" }, "devDependencies": { "@types/jscodeshift": "^17.1.1", diff --git a/docs/adapters/anthropic.md b/docs/adapters/anthropic.md index 677028d70..0fe7bdd74 100644 --- a/docs/adapters/anthropic.md +++ b/docs/adapters/anthropic.md @@ -109,7 +109,7 @@ const stream = chat({ ## Model Options -Anthropic supports various provider-specific options: +Anthropic supports various provider-specific options. Sampling parameters live here too — `temperature`, `top_p`, and `max_tokens` — rather than as root-level props on `chat()`: ```typescript const stream = chat({ @@ -125,6 +125,8 @@ const stream = chat({ }); ``` +> If you previously passed `temperature` / `topP` / `maxTokens` at the root of `chat()`, see [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options). + ### Thinking (Extended Thinking) Enable extended thinking with a token budget. This allows Claude to show its reasoning process, which is streamed as `thinking` chunks: diff --git a/docs/adapters/gemini.md b/docs/adapters/gemini.md index 2c1d05f08..f7ff43954 100644 --- a/docs/adapters/gemini.md +++ b/docs/adapters/gemini.md @@ -304,7 +304,7 @@ for await (const chunk of stream) { ## Model Options -Gemini supports various model-specific options: +Gemini supports various model-specific options. Sampling parameters live here too — `temperature`, `topP`, and `maxOutputTokens` — rather than as root-level props on `chat()`: ```typescript const stream = chat({ @@ -320,6 +320,8 @@ const stream = chat({ }); ``` +> If you previously passed `temperature` / `topP` / `maxTokens` at the root of `chat()`, see [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options). + ### Thinking Enable thinking for models that support it: diff --git a/docs/adapters/grok.md b/docs/adapters/grok.md index 3f6db60b8..528226903 100644 --- a/docs/adapters/grok.md +++ b/docs/adapters/grok.md @@ -106,13 +106,16 @@ const stream = chat({ ## Model Options -Grok supports various provider-specific options: +Grok supports various provider-specific options. Sampling parameters live here too — `temperature`, `top_p`, and `max_tokens` — rather than as root-level props on `chat()`: ```typescript const stream = chat({ adapter: grokText("grok-4"), messages, modelOptions: { + temperature: 0.7, + top_p: 0.9, + max_tokens: 1024, frequency_penalty: 0.5, presence_penalty: 0.5, stop: ["END"], @@ -120,6 +123,8 @@ const stream = chat({ }); ``` +> If you previously passed `temperature` / `topP` / `maxTokens` at the root of `chat()`, see [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options). + ## Summarization Summarize long text content: diff --git a/docs/adapters/groq.md b/docs/adapters/groq.md index 1c4c81644..232dd4606 100644 --- a/docs/adapters/groq.md +++ b/docs/adapters/groq.md @@ -110,7 +110,7 @@ const stream = chat({ ## Model Options -Groq supports various provider-specific options: +Groq supports various provider-specific options. Sampling parameters live here too — `temperature`, `top_p`, and `max_completion_tokens` (Groq's token-limit key) — rather than as root-level props on `chat()`: ```typescript const stream = chat({ @@ -124,6 +124,8 @@ const stream = chat({ }); ``` +> If you previously passed `temperature` / `topP` / `maxTokens` at the root of `chat()`, see [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options). + ### Reasoning Enable reasoning for models that support it (e.g., `openai/gpt-oss-120b`, `qwen/qwen3-32b`). This allows the model to show its reasoning process, which is streamed as `thinking` chunks: diff --git a/docs/adapters/ollama.md b/docs/adapters/ollama.md index 1dc0a0458..d7e6ec18e 100644 --- a/docs/adapters/ollama.md +++ b/docs/adapters/ollama.md @@ -131,55 +131,63 @@ const stream = chat({ ## Model Options -Ollama supports various provider-specific options: +Ollama supports various provider-specific options. Unlike the other providers, Ollama nests its sampling and runner parameters inside an `options` object **within** `modelOptions` — `temperature`, `top_p`, and `num_predict` (the token-limit key) all live under `modelOptions.options`: ```typescript const stream = chat({ adapter: ollamaText("llama3"), messages, modelOptions: { - temperature: 0.7, - top_p: 0.9, - top_k: 40, - num_predict: 1000, // Max tokens to generate - repeat_penalty: 1.1, - num_ctx: 4096, // Context window size - num_gpu: -1, // GPU layers (-1 = auto) + options: { + temperature: 0.7, + top_p: 0.9, + top_k: 40, + num_predict: 1000, // Max tokens to generate + repeat_penalty: 1.1, + num_ctx: 4096, // Context window size + num_gpu: -1, // GPU layers (-1 = auto) + }, }, }); ``` +> If you previously passed `temperature` / `topP` / `maxTokens` at the root of `chat()`, note that for Ollama they map to `modelOptions.options.temperature`, `modelOptions.options.top_p`, and `modelOptions.options.num_predict`. See [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options). + ### Advanced Options +All sampling and runner parameters are nested under `modelOptions.options`: + ```typescript modelOptions: { - // Sampling - temperature: 0.7, - top_p: 0.9, - top_k: 40, - min_p: 0.05, - typical_p: 1.0, - - // Generation - num_predict: 1000, - repeat_penalty: 1.1, - repeat_last_n: 64, - penalize_newline: false, - - // Performance - num_ctx: 4096, - num_batch: 512, - num_gpu: -1, - num_thread: 0, // 0 = auto - - // Memory - use_mmap: true, - use_mlock: false, - - // Mirostat sampling - mirostat: 0, // 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0 - mirostat_tau: 5.0, - mirostat_eta: 0.1, + options: { + // Sampling + temperature: 0.7, + top_p: 0.9, + top_k: 40, + min_p: 0.05, + typical_p: 1.0, + + // Generation + num_predict: 1000, + repeat_penalty: 1.1, + repeat_last_n: 64, + penalize_newline: false, + + // Performance + num_ctx: 4096, + num_batch: 512, + num_gpu: -1, + num_thread: 0, // 0 = auto + + // Memory + use_mmap: true, + use_mlock: false, + + // Mirostat sampling + mirostat: 0, // 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0 + mirostat_tau: 5.0, + mirostat_eta: 0.1, + }, } ``` diff --git a/docs/adapters/openai.md b/docs/adapters/openai.md index e780a9a0e..ecdd7be9a 100644 --- a/docs/adapters/openai.md +++ b/docs/adapters/openai.md @@ -155,7 +155,7 @@ const stream = chat({ ## Model Options -OpenAI supports various provider-specific options: +OpenAI supports various provider-specific options. Sampling parameters live here too — `temperature`, `top_p`, and `max_output_tokens` (the Responses API token-limit key) — rather than as root-level props on `chat()`: ```typescript const stream = chat({ @@ -163,7 +163,7 @@ const stream = chat({ messages, modelOptions: { temperature: 0.7, - max_tokens: 1000, + max_output_tokens: 1000, top_p: 0.9, frequency_penalty: 0.5, presence_penalty: 0.5, @@ -172,6 +172,8 @@ const stream = chat({ }); ``` +> The `openaiChatCompletions` adapter targets `/v1/chat/completions`, where the token-limit key is `max_tokens` (not `max_output_tokens`). If you previously passed `temperature` / `topP` / `maxTokens` at the root of `chat()`, see [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options). + ### Reasoning Enable reasoning for models that support it (e.g., GPT-5, O3). This allows the model to show its reasoning process, which is streamed as `thinking` chunks: diff --git a/docs/adapters/openrouter.md b/docs/adapters/openrouter.md index 27a9cce09..3b86fc328 100644 --- a/docs/adapters/openrouter.md +++ b/docs/adapters/openrouter.md @@ -134,6 +134,24 @@ const stream = chat({ }); ``` +## Model Options + +OpenRouter supports various provider-specific options. Sampling parameters live here too — `temperature`, `topP`, and `maxCompletionTokens` (OpenRouter's token-limit key for the chat adapter) — rather than as root-level props on `chat()`: + +```typescript +const stream = chat({ + adapter: openRouterText("openai/gpt-5"), + messages, + modelOptions: { + temperature: 0.7, + topP: 0.9, + maxCompletionTokens: 1024, + }, +}); +``` + +> If you previously passed `temperature` / `topP` / `maxTokens` at the root of `chat()`, see [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options). + ## Chat Completions vs Responses (beta) OpenRouter exposes two OpenAI-compatible wire formats, and the adapter diff --git a/docs/advanced/middleware.md b/docs/advanced/middleware.md index 3d7a7b02f..7de02005f 100644 --- a/docs/advanced/middleware.md +++ b/docs/advanced/middleware.md @@ -127,15 +127,27 @@ const dynamicTemperature: ChatMiddleware = { } if (ctx.phase === "beforeModel" && ctx.iteration > 0) { - // Increase temperature on retries — other fields stay unchanged + // Increase temperature on retries. Sampling params live in the + // provider-native modelOptions object — `temperature` is universal, + // so it's the same key across providers. Spread the existing + // modelOptions so other model options stay unchanged. + const current = + typeof config.modelOptions?.temperature === "number" + ? config.modelOptions.temperature + : 0.7; return { - temperature: Math.min((config.temperature ?? 0.7) + 0.1, 1.0), + modelOptions: { + ...config.modelOptions, + temperature: Math.min(current + 0.1, 1.0), + }, }; } }, }; ``` +> Sampling parameters (`temperature`, `top_p` / `topP`, the various `max*Tokens` keys) live inside `modelOptions` under each provider's native name — they are no longer root config fields. `temperature` happens to be spelled the same across every provider, so the example above is provider-agnostic; if you mutate a token limit instead, use the provider-native key (e.g. `max_output_tokens` for OpenAI, `num_predict` nested under `modelOptions.options` for Ollama). See [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options). + **Config fields you can transform:** | Field | Type | Description | @@ -143,11 +155,8 @@ const dynamicTemperature: ChatMiddleware = { | `messages` | `ModelMessage[]` | Conversation history | | `systemPrompts` | `string[]` | System prompts | | `tools` | `Tool[]` | Available tools | -| `temperature` | `number` | Sampling temperature | -| `topP` | `number` | Nucleus sampling | -| `maxTokens` | `number` | Token limit | | `metadata` | `Record` | Request metadata | -| `modelOptions` | `Record` | Provider-specific options | +| `modelOptions` | `Record` | Provider-native options — this is where sampling params (`temperature`, `top_p` / `topP`, the provider's `max*Tokens` key) now live, alongside every other model-specific knob. See [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options). | When multiple middleware define `onConfig`, the config is **piped** through them in order — each receives the merged config from the previous middleware. @@ -180,11 +189,8 @@ const injectDefs: ChatMiddleware = { |-------|------|-------------| | `messages` | `ModelMessage[]` | Conversation history sent to the final call | | `systemPrompts` | `SystemPrompt[]` | System prompts on the final call | -| `temperature` | `number` | Sampling temperature | -| `topP` | `number` | Nucleus sampling | -| `maxTokens` | `number` | Token limit | | `metadata` | `Record` | Request metadata | -| `modelOptions` | `Record` | Provider-specific options | +| `modelOptions` | `Record` | Provider-native options — this is where sampling params (`temperature`, `top_p` / `topP`, the provider's `max*Tokens` key) now live, alongside every other model-specific knob. See [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options). | | `outputSchema` | `JSONSchema` | JSON Schema being sent to the provider for structured output | **Ordering at the structured-output boundary:** diff --git a/docs/advanced/typed-options.md b/docs/advanced/typed-options.md index df849bb0c..c9c4392d0 100644 --- a/docs/advanced/typed-options.md +++ b/docs/advanced/typed-options.md @@ -28,9 +28,12 @@ import { openaiText } from '@tanstack/ai-openai' const chatOptions = createChatOptions({ adapter: openaiText('gpt-5.2'), - // modelOptions, temperature, systemPrompts, tools — all type-checked - // against the adapter+model pair above. + // modelOptions, systemPrompts, tools — all type-checked against the + // adapter+model pair above. Sampling params (temperature, top_p, + // max_output_tokens, …) live inside modelOptions, under each provider's + // native key. modelOptions: { + temperature: 0.3, reasoning: { effort: 'medium' }, }, }) diff --git a/docs/api/ai.md b/docs/api/ai.md index 0f50d74cd..af2cdfdbe 100644 --- a/docs/api/ai.md +++ b/docs/api/ai.md @@ -46,7 +46,7 @@ const stream = chat({ - `systemPrompts?` - System prompts to prepend to messages - `agentLoopStrategy?` - Strategy for agent loops (default: `maxIterations(5)`) - `abortController?` - AbortController for cancellation -- `modelOptions?` - Model-specific options (renamed from `providerOptions`) +- `modelOptions?` - Provider-native model options. This is where sampling parameters live — `temperature`, `top_p`/`topP`, and the provider's token-limit key (`max_output_tokens`, `max_tokens`, `maxOutputTokens`, …) — under each provider's canonical name, rather than as generic root-level props. See [Moving Sampling Options into modelOptions](../migration/sampling-options-to-model-options). (Renamed from `providerOptions`.) - `threadId?` - AG-UI thread identifier propagated into `RUN_STARTED` events for run correlation - `runId?` - AG-UI run identifier (auto-generated if omitted) - `parentRunId?` - AG-UI parent run identifier for nested runs diff --git a/docs/config.json b/docs/config.json index eb9f7a97c..404ccb1ab 100644 --- a/docs/config.json +++ b/docs/config.json @@ -248,6 +248,10 @@ { "label": "AG-UI Client Compliance", "to": "migration/ag-ui-compliance" + }, + { + "label": "Sampling Options to modelOptions", + "to": "migration/sampling-options-to-model-options" } ] }, diff --git a/docs/migration/migration.md b/docs/migration/migration.md index 72c077fe7..d9da89f01 100644 --- a/docs/migration/migration.md +++ b/docs/migration/migration.md @@ -197,6 +197,8 @@ These options are now available at the top level: - `maxTokens` - Maximum tokens to generate - `metadata` - Additional metadata to attach +> **Heads up — sampling has since moved (breaking).** In a later release, the sampling props (`temperature`, `topP`, `maxTokens`) were removed from the root of `chat()` and now live in provider-native `modelOptions`. Passing them at the root no longer type-checks or takes effect. See [Moving Sampling Options into modelOptions](./sampling-options-to-model-options) for the codemod and provider-native key names. `metadata` stays at the root. + ## 3. `providerOptions` → `modelOptions` The `providerOptions` parameter has been renamed to `modelOptions` for clarity. This parameter contains model-specific options that vary by provider and model. diff --git a/docs/migration/sampling-options-to-model-options.md b/docs/migration/sampling-options-to-model-options.md new file mode 100644 index 000000000..05efdf27c --- /dev/null +++ b/docs/migration/sampling-options-to-model-options.md @@ -0,0 +1,211 @@ +--- +title: Moving Sampling Options into modelOptions +--- + +# Moving Sampling Options into `modelOptions` + +> **TL;DR:** This is a **breaking change**. The root-level convenience sampling props on `chat()` / `ai()` / `generate()` — `temperature`, `topP`, and `maxTokens` — have been **removed** and now live inside provider-native `modelOptions` instead. Passing them at the root no longer type-checks and has no effect at runtime. Move each one into `modelOptions` under its provider's canonical name (e.g. OpenAI's `max_output_tokens`, Anthropic's `max_tokens`, Gemini's `maxOutputTokens`, Ollama's nested `options.num_predict`). A provider-aware codemod does the rewrite for you. `metadata` is unaffected and stays at the root. + +## What changed + +Previously, `chat()` accepted three generic sampling props directly at the root of its options: + +```typescript +chat({ + adapter: openaiText('gpt-4o'), + messages, + temperature: 0.3, + topP: 0.9, + maxTokens: 100, +}) +``` + +These were a convenience layer that the runtime mapped onto whatever the underlying provider expected. That generic mapping is now gone. Sampling parameters live where every other model-specific knob already lives — inside the provider-native `modelOptions` object — under each provider's own canonical key name. + +```typescript +chat({ + adapter: openaiText('gpt-4o'), + messages, + modelOptions: { + temperature: 0.3, + top_p: 0.9, + max_output_tokens: 100, + }, +}) +``` + +## Why it changed + +- **Provider-native, single source of truth.** Every provider names these parameters differently — OpenAI's Responses API wants `max_output_tokens`, Anthropic wants `max_tokens`, Gemini wants `maxOutputTokens`, Ollama nests them under `options`. A single generic `maxTokens` prop had to guess the target per provider. Putting them in `modelOptions` means there is exactly one place sampling lives, and it matches the provider's own API surface. +- **Typed.** `modelOptions` is already typed per adapter+model, so moving sampling there gives you autocomplete and compile-time checking for the exact keys a given model accepts — instead of three loosely-typed root props. +- **No generic mapping.** Reasoning models in particular do not treat these parameters uniformly (some ignore `temperature`, some reject `max_tokens` below the thinking budget, etc.). A generic root-level mapping papered over those differences; provider-native `modelOptions` lets each adapter handle them honestly. + +## Before / after by provider + +The root prop names are the same everywhere (`temperature`, `topP`, `maxTokens`). The `modelOptions` target key differs per provider — use the exact key your provider expects. + +### OpenAI + +```typescript +// Before +chat({ + adapter: openaiText('gpt-4o'), + messages, + temperature: 0.3, + topP: 0.9, + maxTokens: 100, +}) + +// After +chat({ + adapter: openaiText('gpt-4o'), + messages, + modelOptions: { + temperature: 0.3, + top_p: 0.9, + max_output_tokens: 100, + }, +}) +``` + +### Anthropic + +```typescript +// Before +chat({ + adapter: anthropicText('claude-sonnet-4-5'), + messages, + temperature: 0.3, + topP: 0.9, + maxTokens: 1024, +}) + +// After +chat({ + adapter: anthropicText('claude-sonnet-4-5'), + messages, + modelOptions: { + temperature: 0.3, + top_p: 0.9, + max_tokens: 1024, + }, +}) +``` + +### Gemini + +```typescript +// Before +chat({ + adapter: geminiText('gemini-3.1-pro-preview'), + messages, + temperature: 0.3, + topP: 0.9, + maxTokens: 2048, +}) + +// After +chat({ + adapter: geminiText('gemini-3.1-pro-preview'), + messages, + modelOptions: { + temperature: 0.3, + topP: 0.9, + maxOutputTokens: 2048, + }, +}) +``` + +### Ollama (nested under `options`) + +Ollama is the one provider where sampling parameters are **nested** inside an `options` object within `modelOptions`, and the token limit is named `num_predict`: + +```typescript +// Before +chat({ + adapter: ollamaText('llama3'), + messages, + temperature: 0.3, + topP: 0.9, + maxTokens: 1000, +}) + +// After +chat({ + adapter: ollamaText('llama3'), + messages, + modelOptions: { + options: { + temperature: 0.3, + top_p: 0.9, + num_predict: 1000, + }, + }, +}) +``` + +## Provider key reference + +| Root prop | OpenAI | Anthropic | Gemini | Grok | Groq | OpenRouter | Ollama (nested under `options`) | +| ------------- | ------------------- | ------------ | ----------------- | ------------ | ----------------------- | --------------------- | ------------------------------- | +| `temperature` | `temperature` | `temperature`| `temperature` | `temperature`| `temperature` | `temperature` | `options.temperature` | +| `topP` | `top_p` | `top_p` | `topP` | `top_p` | `top_p` | `topP` | `options.top_p` | +| `maxTokens` | `max_output_tokens` | `max_tokens` | `maxOutputTokens` | `max_tokens` | `max_completion_tokens` | `maxCompletionTokens` | `options.num_predict` | + +## Automated codemod + +A jscodeshift codemod moves the root sampling props into `modelOptions` for you, renaming each one to the correct provider-native key. It resolves the provider from the `adapter:` factory call (e.g. `openaiText('gpt-4o')` → OpenAI), so the rewrite is provider-aware. Run it from the repo: + +```bash +pnpm codemod:move-sampling-to-model-options "src/**/*.{ts,tsx}" +``` + +Or run the published transform directly — no clone needed: + +```bash +npx jscodeshift \ + --parser=tsx \ + -t https://raw.githubusercontent.com/TanStack/ai/main/codemods/move-sampling-to-model-options/transform.ts \ + "src/**/*.{ts,tsx}" +``` + +Add `--dry --print` to preview the rewrite without modifying files. + +**What it does:** + +- Targets `chat()`, `ai()`, `generate()`, and `createChatOptions()` calls imported from `@tanstack/ai`. +- Resolves the provider from the `adapter:` factory call and renames each present root prop to that provider's canonical key. +- For Ollama, nests the renamed keys inside `modelOptions.options`. +- Merges into an existing `modelOptions` object literal when present; preserves the original value expressions and expands shorthand props (`{ temperature }` → `temperature: temperature`). + +**Report + skip (never partial):** the codemod never partially transforms a call. It leaves the call untouched and emits an `api.report(...)` message when it can't safely proceed: + +- **Unresolvable adapter** — no `adapter` prop, the adapter isn't a recognized provider-factory call (e.g. `makeAdapter()`), or it's dynamic/spread. +- **`modelOptions` is not a plain object literal** — e.g. a spread or an identifier reference. +- **Key conflict** — a target renamed key already exists in `modelOptions` (or in `modelOptions.options` for Ollama). Resolve these by hand. + +See [`codemods/move-sampling-to-model-options/README.md`](https://github.com/TanStack/ai/blob/main/codemods/move-sampling-to-model-options/README.md) for the full transform details and limitations. + +## What stays at the root + +`metadata` is **not** a sampling parameter and is unaffected — it stays at the root of `chat()`: + +```typescript +chat({ + adapter: openaiText('gpt-4o'), + messages, + metadata: { requestId: 'abc-123' }, // ← still at the root + modelOptions: { + temperature: 0.3, + max_output_tokens: 100, + }, +}) +``` + +## Need Help? + +- [Per-Model Type Safety](../advanced/per-model-type-safety) — how the adapter+model pair drives `modelOptions` inference. +- [API Reference](../api/ai) — complete `chat()` signature. +- See your provider's adapter page ([OpenAI](../adapters/openai), [Anthropic](../adapters/anthropic), [Gemini](../adapters/gemini), [Ollama](../adapters/ollama)) for the full list of `modelOptions` it accepts. + + diff --git a/package.json b/package.json index 53669156c..924d53b10 100644 --- a/package.json +++ b/package.json @@ -29,6 +29,7 @@ "test:e2e": "pnpm --filter @tanstack/ai-e2e test:e2e", "test:e2e:ui": "pnpm --filter @tanstack/ai-e2e test:e2e:ui", "codemod:ag-ui-compliance": "pnpm --filter @tanstack/ai-codemods exec node ./run.mjs ag-ui-compliance", + "codemod:move-sampling-to-model-options": "pnpm --filter @tanstack/ai-codemods exec node ./run.mjs move-sampling-to-model-options", "build": "nx affected --skip-nx-cache --targets=build --exclude=examples/**,testing/**", "build:all": "nx run-many --targets=build --exclude=examples/**,testing/**", "watch": "pnpm run build:all && env NX_DAEMON=true nx watch --all -- pnpm run build:all", diff --git a/packages/ai-anthropic/src/adapters/text.ts b/packages/ai-anthropic/src/adapters/text.ts index f6b5b9d2d..cde22a244 100644 --- a/packages/ai-anthropic/src/adapters/text.ts +++ b/packages/ai-anthropic/src/adapters/text.ts @@ -295,9 +295,7 @@ export class AnthropicTextAdapter< private mapCommonOptionsToAnthropic( options: TextOptions, ) { - const modelOptions = options.modelOptions as - | InternalTextProviderOptions - | undefined + const modelOptions = options.modelOptions const formattedMessages = this.formatMessages(options.messages) const tools = options.tools @@ -306,7 +304,7 @@ export class AnthropicTextAdapter< const validProviderOptions: Partial = {} if (modelOptions) { - const validKeys: Array = [ + const validKeys: Array = [ 'container', 'context_management', 'effort', @@ -317,10 +315,17 @@ export class AnthropicTextAdapter< 'thinking', 'tool_choice', 'top_k', + 'temperature', + 'top_p', ] - const validKeySet = new Set(validKeys) + // `max_tokens` is a legitimate public modelOptions field, but it is read + // via a dedicated path (defaultMaxTokens below) rather than copied into + // validProviderOptions. Exempt it from the dropped-key warning here so a + // correct `modelOptions: { max_tokens }` call doesn't log a spurious + // "dropped unknown key" error, while keeping it out of the copy loop. + const droppedKeyExemptSet = new Set([...validKeys, 'max_tokens']) const droppedKeys = Object.keys(modelOptions).filter( - (key) => !validKeySet.has(key), + (key) => !droppedKeyExemptSet.has(key), ) if (droppedKeys.length > 0) { // Reachable when callers cast around the public type (e.g. @@ -357,7 +362,7 @@ export class AnthropicTextAdapter< validProviderOptions.thinking?.type === 'enabled' ? validProviderOptions.thinking.budget_tokens : undefined - const defaultMaxTokens = options.maxTokens || 1024 + const defaultMaxTokens = modelOptions?.max_tokens || 1024 const maxTokens = thinkingBudget && thinkingBudget >= defaultMaxTokens ? thinkingBudget + 1 @@ -402,18 +407,13 @@ export class AnthropicTextAdapter< } : undefined - // `InternalTextProviderOptions` declares `temperature`, `top_p`, - // and `tools` as `T?: ...` (no `| undefined`), so spread them - // conditionally rather than passing explicit `undefined` from the - // optional common `TextOptions` fields under - // exactOptionalPropertyTypes. + // `temperature`/`top_p` arrive via `...validProviderOptions` (sourced from + // `modelOptions`). `InternalTextProviderOptions` declares `system` and + // `tools` as `T?: ...` (no `| undefined`), so spread them conditionally + // rather than passing explicit `undefined` under exactOptionalPropertyTypes. const requestParams: InternalTextProviderOptions = { model: options.model, max_tokens: maxTokens, - ...(options.temperature !== undefined && { - temperature: options.temperature, - }), - ...(options.topP !== undefined && { top_p: options.topP }), messages: formattedMessages, ...(systemBlocks !== undefined && { system: systemBlocks }), ...(tools !== undefined && { tools }), diff --git a/packages/ai-anthropic/src/text/text-provider-options.ts b/packages/ai-anthropic/src/text/text-provider-options.ts index 4d7b506e8..c75ab26f7 100644 --- a/packages/ai-anthropic/src/text/text-provider-options.ts +++ b/packages/ai-anthropic/src/text/text-provider-options.ts @@ -206,6 +206,24 @@ Recommended for advanced use cases only. You usually only need to use temperatur Required range: x >= 0 */ top_k?: number + /** + * Amount of randomness injected into the response. + * Either use this or top_p, but not both. + * Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks. + * @default 1.0 + */ + temperature?: number + /** + * Use nucleus sampling. + * + * In nucleus sampling, we compute the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified by top_p. You should either alter temperature or top_p, but not both. + */ + top_p?: number + /** + * The maximum number of tokens to generate before stopping. This parameter only specifies the absolute maximum number of tokens to generate. Required by the API; the adapter defaults to 1024 when omitted. + * Range x >= 1. + */ + max_tokens?: number } export type ExternalTextProviderOptions = AnthropicContainerOptions & @@ -244,13 +262,6 @@ export interface InternalTextProviderOptions extends ExternalTextProviderOptions * such as specifying a particular goal or role. */ system?: string | Array - /** - * Amount of randomness injected into the response. - * Either use this or top_p, but not both. - * Defaults to 1.0. Ranges from 0.0 to 1.0. Use temperature closer to 0.0 for analytical / multiple choice, and closer to 1.0 for creative and generative tasks. - * @default 1.0 - */ - temperature?: number tools?: Array @@ -276,13 +287,6 @@ export interface InternalTextProviderOptions extends ExternalTextProviderOptions schema: Record } } - - /** - * Use nucleus sampling. - -In nucleus sampling, we compute the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified by top_p. You should either alter temperature or top_p, but not both. - */ - top_p?: number } const validateTopPandTemperature = (options: InternalTextProviderOptions) => { diff --git a/packages/ai-anthropic/tests/anthropic-adapter.test.ts b/packages/ai-anthropic/tests/anthropic-adapter.test.ts index aad33ee67..73fe952ba 100644 --- a/packages/ai-anthropic/tests/anthropic-adapter.test.ts +++ b/packages/ai-anthropic/tests/anthropic-adapter.test.ts @@ -287,6 +287,8 @@ describe('Anthropic adapter option mapping', () => { stop_sequences: [''], thinking: { type: 'enabled', budget_tokens: 1500 }, top_k: 5, + max_tokens: 3000, + temperature: 0.4, } satisfies AnthropicTextProviderOptions const adapter = createAdapter('claude-3-7-sonnet') @@ -311,8 +313,6 @@ describe('Anthropic adapter option mapping', () => { { role: 'tool', toolCallId: 'call_weather', content: '{"temp":72}' }, ], tools: [weatherTool], - maxTokens: 3000, - temperature: 0.4, modelOptions: providerOptions, })) { chunks.push(chunk) @@ -369,6 +369,97 @@ describe('Anthropic adapter option mapping', () => { }) }) + it('sources temperature and max_tokens from modelOptions', async () => { + mocks.betaMessagesCreate.mockResolvedValueOnce(createTextStream('ok')) + + const adapter = createAdapter('claude-3-7-sonnet') + + for await (const _ of chat({ + adapter, + messages: [{ role: 'user', content: 'Hi' }], + modelOptions: { + temperature: 0.4, + max_tokens: 2048, + } satisfies AnthropicTextProviderOptions, + })) { + // consume stream + } + + const [payload] = mocks.betaMessagesCreate.mock.calls[0]! + expect(payload.temperature).toBe(0.4) + expect(payload.max_tokens).toBe(2048) + }) + + it('does not warn about dropped keys when max_tokens is passed via modelOptions', async () => { + mocks.betaMessagesCreate.mockResolvedValueOnce(createTextStream('ok')) + + const adapter = createAdapter('claude-3-7-sonnet') + + const logger = { + debug: vi.fn(), + info: vi.fn(), + warn: vi.fn(), + error: vi.fn(), + } + + for await (const _ of chat({ + adapter, + messages: [{ role: 'user', content: 'Hi' }], + modelOptions: { + max_tokens: 2048, + } satisfies AnthropicTextProviderOptions, + debug: { logger, errors: true }, + })) { + // consume stream + } + + // max_tokens is read via the dedicated defaultMaxTokens path; it must not + // be flagged as an unknown/dropped modelOptions key. + const droppedKeyError = logger.error.mock.calls.find((call) => + String(call[0]).includes('dropped unknown modelOptions key'), + ) + expect(droppedKeyError).toBeUndefined() + + const [payload] = mocks.betaMessagesCreate.mock.calls[0]! + expect(payload.max_tokens).toBe(2048) + }) + + it('sources top_p from modelOptions', async () => { + // top_p is mutually exclusive with temperature, so exercise it alone. + mocks.betaMessagesCreate.mockResolvedValueOnce(createTextStream('ok')) + + const adapter = createAdapter('claude-3-7-sonnet') + + for await (const _ of chat({ + adapter, + messages: [{ role: 'user', content: 'Hi' }], + modelOptions: { + top_p: 0.7, + } satisfies AnthropicTextProviderOptions, + })) { + // consume stream + } + + const [payload] = mocks.betaMessagesCreate.mock.calls[0]! + expect(payload.top_p).toBe(0.7) + }) + + it('defaults max_tokens to 1024 when not provided via modelOptions', async () => { + mocks.betaMessagesCreate.mockResolvedValueOnce(createTextStream('ok')) + + const adapter = createAdapter('claude-3-7-sonnet') + + for await (const _ of chat({ + adapter, + messages: [{ role: 'user', content: 'Hi' }], + })) { + // consume stream + } + + const [payload] = mocks.betaMessagesCreate.mock.calls[0]! + expect(payload.max_tokens).toBe(1024) + }) + it('native combined mode (#605): wires outputSchema into output_format alongside tools on Claude 4.5+', async () => { // Final-turn JSON the model emits when output_format is in play. const finalJson = JSON.stringify({ city: 'Berlin', temp: 18 }) diff --git a/packages/ai-code-mode/models-eval/judge.ts b/packages/ai-code-mode/models-eval/judge.ts index 98da67583..5996af306 100644 --- a/packages/ai-code-mode/models-eval/judge.ts +++ b/packages/ai-code-mode/models-eval/judge.ts @@ -63,7 +63,7 @@ export async function judgeReports(params: { ], systemPrompts: [JUDGE_SYSTEM_PROMPT], outputSchema: judgeSchema, - maxTokens: 512, + modelOptions: { max_tokens: 512 }, }) return result as JudgeResult diff --git a/packages/ai-code-mode/models-eval/run-eval.ts b/packages/ai-code-mode/models-eval/run-eval.ts index 4191772b4..ae59fa187 100644 --- a/packages/ai-code-mode/models-eval/run-eval.ts +++ b/packages/ai-code-mode/models-eval/run-eval.ts @@ -210,6 +210,31 @@ function getTextAdapter( } } +/** + * The max-output-tokens cap lives in provider-native `modelOptions` keys (the + * root `maxTokens` field was removed from core `TextOptions`). The key name + * differs per provider. Ollama also keeps its larger context window via + * `num_ctx`. + */ +function maxTokensModelOptions( + provider: EvalProvider, + maxTokens: number, +): Record { + switch (provider) { + case 'ollama': + return { num_predict: maxTokens, num_ctx: 32768 } + case 'openai': + return { max_output_tokens: maxTokens } + case 'anthropic': + case 'grok': + return { max_tokens: maxTokens } + case 'gemini': + return { maxOutputTokens: maxTokens } + case 'groq': + return { max_completion_tokens: maxTokens } + } +} + interface EvalRow { name: string /** Full `provider:model` id */ @@ -787,14 +812,10 @@ async function main() { tools, systemPrompts, agentLoopStrategy: maxIterations(15), - maxTokens: 8192, - ...(provider === 'ollama' - ? { - modelOptions: { - num_ctx: 32768, - }, - } - : {}), + // Sampling now lives in provider-native `modelOptions` keys rather + // than the removed root `maxTokens`. The max-output-tokens key name + // differs per provider. + modelOptions: maxTokensModelOptions(provider, 8192), }) await processor.process( teeStream(stream, (chunk) => { diff --git a/packages/ai-gemini/src/adapters/text.ts b/packages/ai-gemini/src/adapters/text.ts index 8f9ee636a..86cb2569d 100644 --- a/packages/ai-gemini/src/adapters/text.ts +++ b/packages/ai-gemini/src/adapters/text.ts @@ -885,13 +885,6 @@ export class GeminiTextAdapter< contents: this.formatMessages(options.messages), config: { ...modelOpts, - ...(options.temperature !== undefined && { - temperature: options.temperature, - }), - ...(options.topP !== undefined && { topP: options.topP }), - ...(options.maxTokens !== undefined && { - maxOutputTokens: options.maxTokens, - }), ...(mappedThinkingConfig !== undefined && { thinkingConfig: mappedThinkingConfig, }), diff --git a/packages/ai-gemini/src/experimental/text-interactions/adapter.ts b/packages/ai-gemini/src/experimental/text-interactions/adapter.ts index 9267f5fd4..b27feaa1a 100644 --- a/packages/ai-gemini/src/experimental/text-interactions/adapter.ts +++ b/packages/ai-gemini/src/experimental/text-interactions/adapter.ts @@ -471,15 +471,6 @@ function buildInteractionsRequest( const generationConfig: Interactions.GenerationConfig = { ...modelOpts?.generation_config, } - if (options.temperature !== undefined) { - generationConfig.temperature = options.temperature - } - if (options.topP !== undefined) { - generationConfig.top_p = options.topP - } - if (options.maxTokens !== undefined) { - generationConfig.max_output_tokens = options.maxTokens - } const hasGenerationConfig = Object.keys(generationConfig).length > 0 diff --git a/packages/ai-gemini/src/text/text-provider-options.ts b/packages/ai-gemini/src/text/text-provider-options.ts index 6c69f3a21..e571896e0 100644 --- a/packages/ai-gemini/src/text/text-provider-options.ts +++ b/packages/ai-gemini/src/text/text-provider-options.ts @@ -50,6 +50,12 @@ Gemini models use Top-p (nucleus) sampling or a combination of Top-k and nucleus Note: The default value varies by Model and is specified by theModel.top_p attribute returned from the getModel function. An empty topK attribute indicates that the model doesn't apply top-k sampling and doesn't allow setting topK on requests. */ topK?: number + /** Controls randomness, range [0.0, 2.0]. Higher = more random. Use this or topP, not both. */ + temperature?: number + /** Nucleus sampling probability mass, range (0.0, 1.0]. */ + topP?: number + /** Maximum number of tokens to generate in the response. */ + maxOutputTokens?: number /** * Seed used in decoding. If not set, the request uses a randomly generated seed. */ diff --git a/packages/ai-gemini/tests/gemini-adapter.test.ts b/packages/ai-gemini/tests/gemini-adapter.test.ts index 5e022ee81..b4b8673d6 100644 --- a/packages/ai-gemini/tests/gemini-adapter.test.ts +++ b/packages/ai-gemini/tests/gemini-adapter.test.ts @@ -103,10 +103,10 @@ describe('GeminiAdapter through AI', () => { messages: [{ role: 'user', content: 'How is the weather in Madrid?' }], modelOptions: { topK: 9, + temperature: 0.4, + topP: 0.8, + maxOutputTokens: 256, }, - temperature: 0.4, - topP: 0.8, - maxTokens: 256, tools: [weatherTool], })) { /* consume stream */ @@ -132,6 +132,42 @@ describe('GeminiAdapter through AI', () => { ]) }) + it('reads sampling options (temperature, topP, maxOutputTokens) from modelOptions', async () => { + const streamChunks = [ + { + candidates: [ + { + content: { parts: [{ text: 'ok' }] }, + finishReason: 'STOP', + }, + ], + usageMetadata: { totalTokenCount: 1 }, + }, + ] + + mocks.generateContentStreamSpy.mockResolvedValue(createStream(streamChunks)) + + const adapter = createTextAdapter() + + for await (const _ of chat({ + adapter, + messages: [{ role: 'user', content: 'hi' }], + modelOptions: { + temperature: 0.6, + topP: 0.95, + maxOutputTokens: 512, + }, + })) { + /* consume stream */ + } + + expect(mocks.generateContentStreamSpy).toHaveBeenCalledTimes(1) + const [payload] = mocks.generateContentStreamSpy.mock.calls[0]! + expect(payload.config.temperature).toBe(0.6) + expect(payload.config.topP).toBe(0.95) + expect(payload.config.maxOutputTokens).toBe(512) + }) + it('joins object-form systemPrompts into systemInstruction and drops foreign metadata', async () => { const streamChunks = [ { @@ -220,6 +256,9 @@ describe('GeminiAdapter through AI', () => { const providerOptions: GeminiTextProviderOptions = { safetySettings, + temperature: 0.61, + topP: 0.37, + maxOutputTokens: 512, stopSequences: ['', '###'], responseMimeType: 'application/json', responseSchema, @@ -256,9 +295,6 @@ describe('GeminiAdapter through AI', () => { for await (const _ of chat({ adapter, messages: [{ role: 'user', content: 'Provide structured response' }], - temperature: 0.61, - topP: 0.37, - maxTokens: 512, systemPrompts: ['Stay concise', 'Return JSON'], modelOptions: providerOptions, })) { @@ -335,8 +371,8 @@ describe('GeminiAdapter through AI', () => { messages: [{ role: 'user', content: 'Tell me a joke' }], modelOptions: { topK: 3, + temperature: 0.2, }, - temperature: 0.2, })) { received.push(chunk) } diff --git a/packages/ai-gemini/tests/text-interactions-adapter.test.ts b/packages/ai-gemini/tests/text-interactions-adapter.test.ts index 15f455743..1bc235610 100644 --- a/packages/ai-gemini/tests/text-interactions-adapter.test.ts +++ b/packages/ai-gemini/tests/text-interactions-adapter.test.ts @@ -229,6 +229,46 @@ describe('GeminiTextInteractionsAdapter', () => { ]) }) + it('reads sampling from modelOptions.generation_config into the request generation_config', async () => { + mocks.interactionsCreateSpy.mockResolvedValue( + mkStream([ + { + event_type: 'interaction.start', + interaction: { id: 'int_gen', status: 'in_progress' }, + }, + { + event_type: 'interaction.complete', + interaction: { id: 'int_gen', status: 'completed' }, + }, + ]), + ) + + const adapter = createAdapter() + const providerOptions: GeminiTextInteractionsProviderOptions = { + generation_config: { + temperature: 0.6, + top_p: 0.95, + max_output_tokens: 512, + }, + } + + await collectChunks( + chat({ + adapter, + messages: [{ role: 'user', content: 'hi' }], + modelOptions: providerOptions, + }), + ) + + expect(mocks.interactionsCreateSpy).toHaveBeenCalledTimes(1) + const [payload] = mocks.interactionsCreateSpy.mock.calls[0]! + expect(payload.generation_config).toMatchObject({ + temperature: 0.6, + top_p: 0.95, + max_output_tokens: 512, + }) + }) + it('includes trailing tool result when chaining with previous_interaction_id', async () => { mocks.interactionsCreateSpy.mockResolvedValue( mkStream([ diff --git a/packages/ai-grok/tests/grok-adapter.test.ts b/packages/ai-grok/tests/grok-adapter.test.ts index 962d3b960..cf148c3e1 100644 --- a/packages/ai-grok/tests/grok-adapter.test.ts +++ b/packages/ai-grok/tests/grok-adapter.test.ts @@ -5,6 +5,7 @@ import { createGrokImage, grokImage } from '../src/adapters/image' import { createGrokSummarize, grokSummarize } from '../src/adapters/summarize' import { EventType } from '@tanstack/ai' import type { StreamChunk, Tool } from '@tanstack/ai' +import type { GrokTextProviderOptions } from '../src/index' // Test helper: a silent logger for test chatStream calls. const testLogger = resolveDebugOption(false) @@ -124,6 +125,42 @@ describe('Grok adapters', () => { expect(grok3.supportsCombinedToolsAndSchema()).toBe(false) expect(grok3Mini.supportsCombinedToolsAndSchema()).toBe(false) }) + + it('forwards sampling options from modelOptions with Grok wire names', async () => { + const streamChunks = [ + { + id: 'chatcmpl-sampling', + model: 'grok-3', + choices: [{ delta: {}, finish_reason: 'stop' }], + usage: { prompt_tokens: 1, completion_tokens: 0, total_tokens: 1 }, + }, + ] + + const adapter = createGrokText('grok-3', 'test-api-key') + const mockCreate = injectMockClient(adapter, streamChunks) + + const modelOptions: GrokTextProviderOptions = { + temperature: 0.5, + top_p: 0.8, + max_tokens: 128, + } + + for await (const _ of adapter.chatStream({ + model: 'grok-3', + messages: [{ role: 'user', content: 'Hello' }], + modelOptions, + logger: testLogger, + })) { + // consume stream + } + + expect(mockCreate).toHaveBeenCalledTimes(1) + expect(mockCreate.mock.calls[0]?.[0]).toMatchObject({ + temperature: 0.5, + top_p: 0.8, + max_tokens: 128, + }) + }) }) describe('Image adapter', () => { diff --git a/packages/ai-groq/tests/groq-adapter.test.ts b/packages/ai-groq/tests/groq-adapter.test.ts index e53e85533..0058d2652 100644 --- a/packages/ai-groq/tests/groq-adapter.test.ts +++ b/packages/ai-groq/tests/groq-adapter.test.ts @@ -14,6 +14,7 @@ import { } from '../src/adapters/text' import { EventType } from '@tanstack/ai' import type { StreamChunk, Tool } from '@tanstack/ai' +import type { GroqTextProviderOptions } from '../src/index' // Test helper: a silent logger for test chatStream calls. const testLogger = resolveDebugOption(false) @@ -148,6 +149,44 @@ describe('Groq adapters', () => { // `response_format` + `tools` + `stream`. expect(adapter.supportsCombinedToolsAndSchema()).toBe(false) }) + + it('forwards sampling options from modelOptions with Groq wire names', async () => { + const streamChunks = [ + { + id: 'chatcmpl-sampling', + model: 'llama-3.3-70b-versatile', + choices: [{ delta: {}, finish_reason: 'stop' }], + x_groq: { + usage: { prompt_tokens: 1, completion_tokens: 0, total_tokens: 1 }, + }, + }, + ] + + const mockCreate = setupMockSdkClient(streamChunks) + const adapter = createGroqText('llama-3.3-70b-versatile', 'test-api-key') + + const modelOptions: GroqTextProviderOptions = { + temperature: 0.5, + top_p: 0.8, + max_completion_tokens: 128, + } + + for await (const _ of adapter.chatStream({ + model: 'llama-3.3-70b-versatile', + messages: [{ role: 'user', content: 'Hello' }], + modelOptions, + logger: testLogger, + })) { + // consume stream + } + + expect(mockCreate).toHaveBeenCalledTimes(1) + expect(mockCreate.mock.calls[0]?.[0]).toMatchObject({ + temperature: 0.5, + top_p: 0.8, + max_completion_tokens: 128, + }) + }) }) }) diff --git a/packages/ai-ollama/src/adapters/text.ts b/packages/ai-ollama/src/adapters/text.ts index 16aa0f1ed..73fed1f0c 100644 --- a/packages/ai-ollama/src/adapters/text.ts +++ b/packages/ai-ollama/src/adapters/text.ts @@ -38,52 +38,6 @@ type ResolveModelOptions = ? OllamaChatModelOptionsByName[TModel] : ChatRequest -/** - * Ollama-specific provider options - */ -export interface OllamaTextProviderOptions { - /** Number of tokens to keep from the prompt */ - num_keep?: number - /** Number of tokens from context to consider for next token prediction */ - top_k?: number - /** Minimum probability for nucleus sampling */ - min_p?: number - /** Tail-free sampling parameter */ - tfs_z?: number - /** Typical probability sampling parameter */ - typical_p?: number - /** Number of previous tokens to consider for repetition penalty */ - repeat_last_n?: number - /** Penalty for repeating tokens */ - repeat_penalty?: number - /** Enable Mirostat sampling (0=disabled, 1=Mirostat, 2=Mirostat 2.0) */ - mirostat?: number - /** Target entropy for Mirostat */ - mirostat_tau?: number - /** Learning rate for Mirostat */ - mirostat_eta?: number - /** Enable penalize_newline */ - penalize_newline?: boolean - /** Enable NUMA support */ - numa?: boolean - /** Context window size */ - num_ctx?: number - /** Batch size for prompt processing */ - num_batch?: number - /** Number of GQA groups (for some models) */ - num_gqa?: number - /** Number of GPU layers to use */ - num_gpu?: number - /** GPU to use for inference */ - main_gpu?: number - /** Use memory-mapped model */ - use_mmap?: boolean - /** Use memory-locked model */ - use_mlock?: boolean - /** Number of threads to use */ - num_thread?: number -} - export interface OllamaTextAdapterOptions { model?: OllamaTextModel host?: string @@ -142,7 +96,9 @@ export class OllamaTextAdapter extends BaseTextAdapter< } } - async *chatStream(options: TextOptions): AsyncIterable { + async *chatStream( + options: TextOptions>, + ): AsyncIterable { const mappedOptions = this.mapCommonOptionsToOllama(options) const { logger } = options try { @@ -550,22 +506,11 @@ export class OllamaTextAdapter extends BaseTextAdapter< }) } - private mapCommonOptionsToOllama(options: TextOptions): ChatRequest { + private mapCommonOptionsToOllama( + options: TextOptions>, + ): ChatRequest { const model = options.model - const modelOptions = options.modelOptions as - | OllamaTextProviderOptions - | undefined - - const ollamaOptions = { - ...(options.temperature !== undefined && { - temperature: options.temperature, - }), - ...(options.topP !== undefined && { top_p: options.topP }), - ...(options.maxTokens !== undefined && { - num_predict: options.maxTokens, - }), - ...modelOptions, - } + const modelOptions = options.modelOptions const formattedMessages = this.formatMessages(options.messages) @@ -581,8 +526,34 @@ export class OllamaTextAdapter extends BaseTextAdapter< return { model, - options: ollamaOptions, messages: formattedMessages, + // Sampling and runner params (temperature, top_p, num_predict, top_k, + // seed, penalties, etc.) live under the nested `options` key — the same + // shape the Ollama SDK's ChatRequest.options expects. Spreading a fresh + // object avoids aliasing the caller's modelOptions.options. + options: { ...modelOptions?.options }, + // Request-level fields the nested modelOptions surface exposes + // (OllamaChatRequest): format / keep_alive / logprobs / top_logprobs, plus + // `think` for models whose options type includes OllamaChatRequestThinking. + // Read structurally and only forwarded when present. `stream` is set by + // the call sites (chatStream / structuredOutput), so it is not forwarded. + ...(modelOptions?.format !== undefined && { + format: modelOptions.format, + }), + ...(modelOptions?.keep_alive !== undefined && { + keep_alive: modelOptions.keep_alive, + }), + ...(modelOptions?.logprobs !== undefined && { + logprobs: modelOptions.logprobs, + }), + ...(modelOptions?.top_logprobs !== undefined && { + top_logprobs: modelOptions.top_logprobs, + }), + ...(modelOptions && + 'think' in modelOptions && + modelOptions.think !== undefined + ? { think: modelOptions.think } + : {}), ...(convertedTools !== undefined && { tools: convertedTools }), } } diff --git a/packages/ai-ollama/src/index.ts b/packages/ai-ollama/src/index.ts index 05c8d0e81..79b0d5d67 100644 --- a/packages/ai-ollama/src/index.ts +++ b/packages/ai-ollama/src/index.ts @@ -9,7 +9,6 @@ export { ollamaText, type OllamaTextAdapterOptions, type OllamaTextModel, - type OllamaTextProviderOptions, } from './adapters/text' export { OLLAMA_TEXT_MODELS as OllamaTextModels } from './model-meta' diff --git a/packages/ai-ollama/src/meta/models-meta.ts b/packages/ai-ollama/src/meta/models-meta.ts index e343eff1e..e27f44863 100644 --- a/packages/ai-ollama/src/meta/models-meta.ts +++ b/packages/ai-ollama/src/meta/models-meta.ts @@ -29,6 +29,8 @@ interface OllamaOptions { num_keep: number seed: number num_predict: number + temperature: number + top_p: number top_k: number tfs_z: number typical_p: number diff --git a/packages/ai-ollama/tests/text-adapter.test.ts b/packages/ai-ollama/tests/text-adapter.test.ts index 6bdb0222e..3efc80753 100644 --- a/packages/ai-ollama/tests/text-adapter.test.ts +++ b/packages/ai-ollama/tests/text-adapter.test.ts @@ -328,6 +328,139 @@ describe('OllamaTextAdapter.structuredOutput', () => { }) }) +describe('OllamaTextAdapter modelOptions (nested options contract)', () => { + it('reads sampling params from nested modelOptions.options and forwards them under ChatRequest.options', async () => { + chatMock.mockResolvedValueOnce( + asyncIterable([ + { + message: { role: 'assistant', content: 'ok' }, + done: true, + done_reason: 'stop', + }, + ]), + ) + + const adapter = createOllamaChat('llama3.2') + await collectStream( + adapter.chatStream({ + logger: testLogger, + model: 'llama3.2', + messages: [{ role: 'user', content: 'hi' }], + modelOptions: { + options: { temperature: 0.2, top_p: 0.6, num_predict: 200 }, + }, + // The nested modelOptions surface is resolved per-model; the test + // exercises the runtime mapping rather than the per-model type, so a + // narrow cast keeps the harness model-agnostic. + // eslint-disable-next-line @typescript-eslint/no-explicit-any + } as any), + ) + + const call = chatMock.mock.calls[0]![0] as { + options?: Record + } + expect(call.options).toMatchObject({ + temperature: 0.2, + top_p: 0.6, + num_predict: 200, + }) + }) + + it('does not double-nest options (request.options.options is undefined)', async () => { + chatMock.mockResolvedValueOnce( + asyncIterable([ + { + message: { role: 'assistant', content: 'ok' }, + done: true, + done_reason: 'stop', + }, + ]), + ) + + const adapter = createOllamaChat('llama3.2') + await collectStream( + adapter.chatStream({ + logger: testLogger, + model: 'llama3.2', + messages: [{ role: 'user', content: 'hi' }], + modelOptions: { + options: { temperature: 0.5 }, + }, + // eslint-disable-next-line @typescript-eslint/no-explicit-any + } as any), + ) + + const call = chatMock.mock.calls[0]![0] as { + options?: Record + } + expect(call.options).toBeDefined() + expect((call.options as { options?: unknown }).options).toBeUndefined() + }) + + it('emits an empty options object when no modelOptions are provided', async () => { + chatMock.mockResolvedValueOnce( + asyncIterable([ + { + message: { role: 'assistant', content: 'ok' }, + done: true, + done_reason: 'stop', + }, + ]), + ) + + const adapter = createOllamaChat('llama3.2') + await collectStream( + adapter.chatStream({ + logger: testLogger, + model: 'llama3.2', + messages: [{ role: 'user', content: 'hi' }], + }), + ) + + const call = chatMock.mock.calls[0]![0] as { + options?: Record + } + expect(call.options).toEqual({}) + }) + + it('forwards request-level fields (format, keep_alive, think) outside of options', async () => { + chatMock.mockResolvedValueOnce( + asyncIterable([ + { + message: { role: 'assistant', content: 'ok' }, + done: true, + done_reason: 'stop', + }, + ]), + ) + + const adapter = createOllamaChat('llama3.2') + await collectStream( + adapter.chatStream({ + logger: testLogger, + model: 'llama3.2', + messages: [{ role: 'user', content: 'hi' }], + modelOptions: { + options: { temperature: 0.3 }, + keep_alive: '10m', + think: true, + }, + // eslint-disable-next-line @typescript-eslint/no-explicit-any + } as any), + ) + + const call = chatMock.mock.calls[0]![0] as { + options?: Record + keep_alive?: unknown + think?: unknown + } + expect(call.keep_alive).toBe('10m') + expect(call.think).toBe(true) + // Request-level fields must not leak into the sampling options bag. + expect(call.options).toEqual({ temperature: 0.3 }) + }) +}) + describe('OllamaTextAdapter system prompts', () => { it('prepends mixed string + object-form systemPrompts as a single role:system message and drops foreign metadata', async () => { chatMock.mockResolvedValueOnce( diff --git a/packages/ai-openai/src/text/text-provider-options.ts b/packages/ai-openai/src/text/text-provider-options.ts index e3e8be740..bab75aef3 100644 --- a/packages/ai-openai/src/text/text-provider-options.ts +++ b/packages/ai-openai/src/text/text-provider-options.ts @@ -14,8 +14,28 @@ import type { ToolChoice } from '../tools/tool-choice' import type { WebSearchPreviewTool } from '../tools/web-search-preview-tool' import type { WebSearchTool } from '../tools/web-search-tool' +/** Sampling controls shared by all Responses-API models. */ +export interface OpenAISamplingOptions { + /** + * Sampling temperature, 0–2. Higher = more random. Recommend altering this or top_p, not both. + * Note: OpenAI reasoning models (o-series, GPT-5 reasoning) reject temperature/top_p. + * https://platform.openai.com/docs/api-reference/responses/create#responses_create-temperature + */ + temperature?: number + /** + * Nucleus sampling. 0.1 = only the top 10% probability mass is considered. + * https://platform.openai.com/docs/api-reference/responses/create#responses_create-top_p + */ + top_p?: number + /** + * Upper bound on generated tokens (visible output + reasoning tokens). + * https://platform.openai.com/docs/api-reference/responses/create#responses_create-max_output_tokens + */ + max_output_tokens?: number +} + // Core, always-available options for Responses API -export interface OpenAIBaseOptions { +export interface OpenAIBaseOptions extends OpenAISamplingOptions { /** Whether to run the model response in the background. Learn more here: @@ -255,12 +275,6 @@ When using along with previous_response_id, the instructions from a previous res https://platform.openai.com/docs/api-reference/responses/create#responses_create-instructions */ instructions?: string - /** - * An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens. - * (Responses API name: max_output_tokens) - * https://platform.openai.com/docs/api-reference/responses/create#responses_create-max_output_tokens - */ - max_output_tokens?: number /** * The model name (e.g. "gpt-4o", "gpt-5", "gpt-4.1-mini", etc). @@ -275,16 +289,6 @@ https://platform.openai.com/docs/api-reference/responses/create#responses_create */ stream?: boolean - /** - * What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both. - * https://platform.openai.com/docs/api-reference/responses/create#responses_create-temperature - */ - temperature?: number - /** - * An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. - * https://platform.openai.com/docs/api-reference/responses/create#responses_create-top_p - */ - top_p?: number /** * Tools the model may call (functions, web_search, etc). * Function tool example: diff --git a/packages/ai-openai/tests/chat-per-model-type-safety.test.ts b/packages/ai-openai/tests/chat-per-model-type-safety.test.ts index 4117453ce..521b56eb6 100644 --- a/packages/ai-openai/tests/chat-per-model-type-safety.test.ts +++ b/packages/ai-openai/tests/chat-per-model-type-safety.test.ts @@ -318,4 +318,35 @@ describe('OpenAI provider options shape assertions', () => { expectTypeOf().not.toHaveProperty('text') }) }) + + describe('sampling options are available on every model regardless of feature set', () => { + it('exposes temperature/top_p/max_output_tokens on a base model (gpt-4o)', () => { + type Options = OpenAIChatModelProviderOptionsByName['gpt-4o'] + expectTypeOf().toHaveProperty('temperature') + expectTypeOf().toHaveProperty('top_p') + expectTypeOf().toHaveProperty('max_output_tokens') + }) + + it('exposes sampling options on a reasoning model (o3)', () => { + // Reasoning models reject temperature/top_p at the API, but the type + // surface still carries them (documented caveat) so callers tuning a + // single backend get a uniform sampling shape. + type Options = OpenAIChatModelProviderOptionsByName['o3'] + expectTypeOf().toHaveProperty('temperature') + expectTypeOf().toHaveProperty('top_p') + expectTypeOf().toHaveProperty('max_output_tokens') + }) + + it('accepts sampling options through chat() modelOptions', () => { + chat({ + adapter: openaiText('gpt-4o'), + messages: [{ role: 'user', content: 'hi' }], + modelOptions: { + temperature: 0.3, + top_p: 0.9, + max_output_tokens: 256, + }, + }) + }) + }) }) diff --git a/packages/ai-openai/tests/openai-adapter.test.ts b/packages/ai-openai/tests/openai-adapter.test.ts index 90b805c8a..116247fcf 100644 --- a/packages/ai-openai/tests/openai-adapter.test.ts +++ b/packages/ai-openai/tests/openai-adapter.test.ts @@ -75,8 +75,13 @@ describe('OpenAI adapter option mapping', () => { }, } + // Sampling options now live exclusively in `modelOptions` (provider-native + // wire names) rather than the root `temperature`/`topP`/`maxTokens` fields. const modelOptions: OpenAITextProviderOptions = { tool_choice: 'required', + temperature: 0.25, + top_p: 0.6, + max_output_tokens: 1024, } const chunks: StreamChunk[] = [] @@ -99,9 +104,6 @@ describe('OpenAI adapter option mapping', () => { { role: 'tool', toolCallId: 'call_weather', content: '{"temp":72}' }, ], tools: [weatherTool], - temperature: 0.25, - topP: 0.6, - maxTokens: 1024, metadata: { requestId: 'req-42' }, modelOptions, })) { @@ -173,4 +175,49 @@ describe('OpenAI adapter option mapping', () => { const [payload] = responsesCreate.mock.calls[0]! expect(payload.instructions).toBe('Plain string.\nStructured.') }) + + it('forwards sampling options from modelOptions with OpenAI wire names', async () => { + const mockStream = createMockChatCompletionsStream([ + { + type: 'response.created', + response: { + id: 'resp-sampling', + model: 'gpt-4o-mini', + status: 'in_progress', + created_at: 1234567890, + }, + }, + { + type: 'response.completed', + response: { + id: 'resp-sampling', + status: 'completed', + usage: { input_tokens: 1, output_tokens: 0 }, + }, + }, + ]) + + const responsesCreate = vi.fn().mockResolvedValueOnce(mockStream) + const adapter = createAdapter('gpt-4o-mini') + ;(adapter as any).client = { responses: { create: responsesCreate } } + + const modelOptions: OpenAITextProviderOptions = { + temperature: 0.3, + top_p: 0.9, + max_output_tokens: 256, + } + + for await (const _ of chat({ + adapter, + messages: [{ role: 'user', content: 'hi' }], + modelOptions, + })) { + // consume stream + } + + const [payload] = responsesCreate.mock.calls[0]! + expect(payload.temperature).toBe(0.3) + expect(payload.top_p).toBe(0.9) + expect(payload.max_output_tokens).toBe(256) + }) }) diff --git a/packages/ai-openrouter/src/adapters/responses-text.ts b/packages/ai-openrouter/src/adapters/responses-text.ts index edea58f92..5ac694919 100644 --- a/packages/ai-openrouter/src/adapters/responses-text.ts +++ b/packages/ai-openrouter/src/adapters/responses-text.ts @@ -1529,12 +1529,12 @@ export class OpenRouterResponsesTextAdapter< } } - const modelOptions = options.modelOptions as - | (Partial & { variant?: string }) - | undefined - const variantSuffix = modelOptions?.variant - ? `:${modelOptions.variant}` - : '' + // `variant` is OpenRouter metadata used only to build the `:variant` model + // suffix — it is not part of the wire `ResponsesRequest`, so strip it out + // of the spread body (mirrors the chat-completions adapter). + const { variant, ...modelOptions } = (options.modelOptions ?? + {}) as Partial & { variant?: string } + const variantSuffix = variant ? `:${variant}` : '' const input = this.convertMessagesToInput(options.messages) @@ -1565,13 +1565,6 @@ export class OpenRouterResponsesTextAdapter< > = { ...modelOptions, model: options.model + variantSuffix, - ...(options.temperature !== undefined && { - temperature: options.temperature, - }), - ...(options.maxTokens !== undefined && { - maxOutputTokens: options.maxTokens, - }), - ...(options.topP !== undefined && { topP: options.topP }), ...(options.metadata !== undefined && { metadata: options.metadata }), ...(() => { const prompts = normalizeSystemPrompts(options.systemPrompts) diff --git a/packages/ai-openrouter/src/adapters/text.ts b/packages/ai-openrouter/src/adapters/text.ts index ee5a3a5c2..754c3ca6b 100644 --- a/packages/ai-openrouter/src/adapters/text.ts +++ b/packages/ai-openrouter/src/adapters/text.ts @@ -1119,12 +1119,11 @@ export class OpenRouterTextAdapter< protected mapOptionsToRequest( options: TextOptions>, ): Omit { - const modelOptions = options.modelOptions as - | (Record & { variant?: string }) - | undefined - const variantSuffix = modelOptions?.variant - ? `:${modelOptions.variant}` - : '' + // `variant` is OpenRouter metadata used only to build the `:variant` model + // suffix — it must NOT be spread into the request body. Destructure it out + // so the remaining sampling/provider options flow through `...restModelOptions`. + const { variant, ...restModelOptions } = options.modelOptions ?? {} + const variantSuffix = variant ? `:${variant}` : '' const messages: Array = [] const systemPrompts = normalizeSystemPrompts(options.systemPrompts) @@ -1142,20 +1141,14 @@ export class OpenRouterTextAdapter< ? convertToolsToProviderFormat(options.tools) : undefined - // Spread modelOptions first so explicit top-level options (set below) win - // when defined but `undefined` doesn't clobber values the caller set in - // modelOptions. + // `modelOptions` is the sole sampling surface: callers set provider-native + // wire names (`temperature`, `topP`, `maxCompletionTokens`, etc.) there and + // they flow through the spread below. The root `temperature`/`topP`/ + // `maxTokens` fields are intentionally NOT read here. const request: Omit = { - ...modelOptions, + ...restModelOptions, model: options.model + variantSuffix, messages, - ...(options.temperature !== undefined && { - temperature: options.temperature, - }), - ...(options.maxTokens !== undefined && { - maxCompletionTokens: options.maxTokens, - }), - ...(options.topP !== undefined && { topP: options.topP }), ...(tools && tools.length > 0 && { tools }), } return request diff --git a/packages/ai-openrouter/tests/openrouter-adapter.test.ts b/packages/ai-openrouter/tests/openrouter-adapter.test.ts index 02a5edb15..f122723a6 100644 --- a/packages/ai-openrouter/tests/openrouter-adapter.test.ts +++ b/packages/ai-openrouter/tests/openrouter-adapter.test.ts @@ -113,8 +113,13 @@ describe('OpenRouter adapter option mapping', () => { const adapter = createAdapter() + // Sampling options now live exclusively in `modelOptions` (provider-native + // camelCase wire names) rather than the root temperature/topP/maxTokens. const modelOptions: OpenRouterTextModelOptions = { toolChoice: 'auto', + temperature: 0.25, + topP: 0.6, + maxCompletionTokens: 1024, } const chunks: Array = [] @@ -137,9 +142,6 @@ describe('OpenRouter adapter option mapping', () => { { role: 'tool', toolCallId: 'call_weather', content: '{"temp":72}' }, ], tools: [weatherTool], - temperature: 0.25, - topP: 0.6, - maxTokens: 1024, modelOptions, })) { chunks.push(chunk) @@ -1383,6 +1385,58 @@ describe('OpenRouter modelOptions pass-through', () => { expect(params.responseFormat).toEqual({ type: 'json_object' }) }) + it('forwards temperature/topP/maxCompletionTokens from modelOptions', async () => { + setupMockSdkClient(minimalStreamChunks) + const adapter = createAdapter() + + // Sampling now flows exclusively through `modelOptions` using the SDK's + // camelCase wire names — the root temperature/topP/maxTokens fields are no + // longer read by the adapter. + const modelOptions: OpenRouterTextModelOptions = { + temperature: 0.1, + topP: 0.5, + maxCompletionTokens: 64, + } + + for await (const _ of chat({ + adapter, + messages: [{ role: 'user', content: 'test' }], + modelOptions, + })) { + // consume + } + + const [rawParams] = mockSend.mock.calls[0]! + const params = rawParams.chatRequest + expect(params.temperature).toBe(0.1) + expect(params.topP).toBe(0.5) + expect(params.maxCompletionTokens).toBe(64) + }) + + it('uses variant only for the model suffix and never sends it in the request body', async () => { + setupMockSdkClient(minimalStreamChunks) + const adapter = createAdapter() + + for await (const _ of chat({ + adapter, + messages: [{ role: 'user', content: 'test' }], + modelOptions: { + variant: 'free', + temperature: 0.2, + } as OpenRouterTextModelOptions, + })) { + // consume + } + + const [rawParams] = mockSend.mock.calls[0]! + const params = rawParams.chatRequest + // `variant` is OpenRouter metadata used purely to build the model suffix; + // it must not leak into the request body alongside the real sampling field. + expect(params.model).toBe('openai/gpt-4o-mini:free') + expect(params).not.toHaveProperty('variant') + expect(params.temperature).toBe(0.2) + }) + it('forwards common options (provider, plugins, etc.) to the SDK request', async () => { setupMockSdkClient(minimalStreamChunks) const adapter = createAdapter() @@ -1417,16 +1471,15 @@ describe('OpenRouter modelOptions pass-through', () => { expect(params.sessionId).toBe('session-abc') }) - it('does not allow modelOptions to override top-level temperature/topP/maxTokens', async () => { + it('reads sampling from modelOptions; modelOptions is the sole sampling source', async () => { setupMockSdkClient(minimalStreamChunks) const adapter = createAdapter() for await (const _ of chat({ adapter, messages: [{ role: 'user', content: 'test' }], - temperature: 0.5, - topP: 0.8, - maxTokens: 500, + // Root sampling fields no longer exist on TextOptions — only the + // provider-native modelOptions values reach the request. modelOptions: { temperature: 0.9, topP: 0.1, @@ -1438,10 +1491,9 @@ describe('OpenRouter modelOptions pass-through', () => { const [rawParams] = mockSend.mock.calls[0]! const params = rawParams.chatRequest - // Top-level values should win because modelOptions has those keys Omitted - expect(params.temperature).toBe(0.5) - expect(params.topP).toBe(0.8) - expect(params.maxCompletionTokens).toBe(500) + expect(params.temperature).toBe(0.9) + expect(params.topP).toBe(0.1) + expect(params.maxCompletionTokens).toBe(9999) }) it('appends variant to model name instead of passing it as a separate property', async () => { diff --git a/packages/ai-openrouter/tests/openrouter-responses-adapter.test.ts b/packages/ai-openrouter/tests/openrouter-responses-adapter.test.ts index 4e1ac11ca..e15b5d2fb 100644 --- a/packages/ai-openrouter/tests/openrouter-responses-adapter.test.ts +++ b/packages/ai-openrouter/tests/openrouter-responses-adapter.test.ts @@ -88,10 +88,12 @@ describe('OpenRouter responses adapter — request shape', () => { systemPrompts: ['Stay concise'], messages: [{ role: 'user', content: 'How is the weather?' }], tools: [weatherTool], - temperature: 0.25, - topP: 0.6, - maxTokens: 1024, - modelOptions: { toolChoice: 'auto' as any }, + modelOptions: { + temperature: 0.25, + topP: 0.6, + maxOutputTokens: 1024, + toolChoice: 'auto' as any, + }, })) { // consume } diff --git a/packages/ai/skills/ai-core/adapter-configuration/SKILL.md b/packages/ai/skills/ai-core/adapter-configuration/SKILL.md index 621bd1c2c..ded9d357a 100644 --- a/packages/ai/skills/ai-core/adapter-configuration/SKILL.md +++ b/packages/ai/skills/ai-core/adapter-configuration/SKILL.md @@ -42,8 +42,10 @@ import { openaiText } from '@tanstack/ai-openai' const stream = chat({ adapter: openaiText('gpt-5.2'), messages, - temperature: 0.7, - maxTokens: 1000, + modelOptions: { + temperature: 0.7, + max_output_tokens: 1000, + }, }) return toServerSentEventsResponse(stream) @@ -53,6 +55,11 @@ The adapter factory function takes the model name as a string literal and an optional config object (API key, base URL, etc.). The model name is passed into the factory, not into `chat()`. +Sampling options (`temperature`, token limits, `top_p`/`topP`, etc.) live +inside `modelOptions` using each provider's native key — they are **not** +top-level options on `chat()`. See the per-provider table in +[Configuring Sampling](#5-configuring-sampling) below. + ## Core Patterns ### 1. Adapter Selection @@ -155,11 +162,11 @@ const openaiStream = chat({ const anthropicStream = chat({ adapter: anthropicText('claude-sonnet-4-6'), messages, - maxTokens: 16000, modelOptions: { + max_tokens: 16000, thinking: { type: 'enabled', - budget_tokens: 8000, // must be >= 1024 and < maxTokens + budget_tokens: 8000, // must be >= 1024 and < max_tokens }, }, }) @@ -168,8 +175,8 @@ const anthropicStream = chat({ const adaptiveStream = chat({ adapter: anthropicText('claude-sonnet-4-6'), messages, - maxTokens: 16000, modelOptions: { + max_tokens: 16000, thinking: { type: 'adaptive', }, @@ -221,7 +228,61 @@ const custom = myOpenai('ft:gpt-5.2:my-org:custom-model:abc123') At runtime, `extendAdapter` simply passes through to the original factory. The `_customModels` parameter is only used for type inference. -### 5. Capability Flag: `supportsCombinedToolsAndSchema` +### 5. Configuring Sampling + +Sampling controls (`temperature`, token limits, nucleus sampling) are passed +inside `modelOptions` using each provider's **native** key. They are not +top-level fields on `chat()`/`ai()`/`generate()`. + +```typescript +// OpenAI — native keys +chat({ + adapter: openaiText('gpt-5.2'), + messages, + modelOptions: { temperature: 0.7, top_p: 0.9, max_output_tokens: 1000 }, +}) + +// Anthropic +chat({ + adapter: anthropicText('claude-sonnet-4-6'), + messages, + modelOptions: { temperature: 0.7, top_p: 0.9, max_tokens: 1000 }, +}) + +// Gemini — camelCase +chat({ + adapter: geminiText('gemini-2.5-pro'), + messages, + modelOptions: { temperature: 0.7, topP: 0.9, maxOutputTokens: 1000 }, +}) + +// Ollama — NESTED under modelOptions.options +chat({ + adapter: ollamaText('llama3.3'), + messages, + modelOptions: { + options: { temperature: 0.7, top_p: 0.9, num_predict: 1000 }, + }, +}) +``` + +Per-provider sampling keys (all live inside `modelOptions`): + +| Provider | Temperature | Nucleus | Max output tokens | +| ----------------- | ------------- | ------- | ----------------------------------- | +| OpenAI | `temperature` | `top_p` | `max_output_tokens` | +| Anthropic | `temperature` | `top_p` | `max_tokens` | +| Gemini | `temperature` | `topP` | `maxOutputTokens` | +| Grok (xAI) | `temperature` | `top_p` | `max_tokens` | +| Groq | `temperature` | `top_p` | `max_completion_tokens` | +| OpenRouter (chat) | `temperature` | `topP` | `maxCompletionTokens` | +| Ollama | `temperature` | `top_p` | `num_predict` (nested in `options`) | + +`temperature` is the one key every provider names identically; token limits and +some sampling options use provider-native names. Ollama nests all sampling under +`modelOptions.options`. + +### 6. Capability Flag: `supportsCombinedToolsAndSchema` Adapters can declare an optional capability method: diff --git a/packages/ai/skills/ai-core/adapter-configuration/references/anthropic-adapter.md b/packages/ai/skills/ai-core/adapter-configuration/references/anthropic-adapter.md index be4903432..1a6d9f334 100644 --- a/packages/ai/skills/ai-core/adapter-configuration/references/anthropic-adapter.md +++ b/packages/ai/skills/ai-core/adapter-configuration/references/anthropic-adapter.md @@ -39,12 +39,15 @@ Note: Model IDs use the format `claude-opus-4-6`, `claude-sonnet-4-6`, etc. chat({ adapter: anthropicText('claude-sonnet-4-6'), messages, - maxTokens: 16000, modelOptions: { + // Sampling + temperature: 0.7, + top_p: 0.9, // cannot be combined with temperature + max_tokens: 16000, // Extended thinking (budget-based) thinking: { type: 'enabled', - budget_tokens: 8000, // must be >= 1024 and < maxTokens + budget_tokens: 8000, // must be >= 1024 and < max_tokens }, // Adaptive thinking (claude-sonnet-4-6, claude-opus-4-6+) thinking: { @@ -89,7 +92,7 @@ ANTHROPIC_API_KEY ## Gotchas -- `thinking.budget_tokens` must be >= 1024 AND less than `maxTokens`. +- `thinking.budget_tokens` must be >= 1024 AND less than `modelOptions.max_tokens`. Failing either check throws a validation error. - Cannot set both `top_p` and `temperature` at the same time (throws error). - `claude-3-5-haiku` and `claude-3-haiku` do NOT support extended thinking. diff --git a/packages/ai/skills/ai-core/adapter-configuration/references/gemini-adapter.md b/packages/ai/skills/ai-core/adapter-configuration/references/gemini-adapter.md index 55e5d012c..0cd038f10 100644 --- a/packages/ai/skills/ai-core/adapter-configuration/references/gemini-adapter.md +++ b/packages/ai/skills/ai-core/adapter-configuration/references/gemini-adapter.md @@ -72,6 +72,9 @@ chat({ // Response modalities responseModalities: ['TEXT'], // Sampling + temperature: 0.7, + topP: 0.9, + maxOutputTokens: 1000, topK: 40, seed: 42, presencePenalty: 0.5, diff --git a/packages/ai/skills/ai-core/adapter-configuration/references/ollama-adapter.md b/packages/ai/skills/ai-core/adapter-configuration/references/ollama-adapter.md index 6ae4462f1..148ebd376 100644 --- a/packages/ai/skills/ai-core/adapter-configuration/references/ollama-adapter.md +++ b/packages/ai/skills/ai-core/adapter-configuration/references/ollama-adapter.md @@ -40,6 +40,9 @@ Models must be pulled first: `ollama pull llama3.3` Ollama models use a generic options type. Provider options vary by the underlying model. The adapter passes options through to the Ollama API. +Sampling options are **nested** under `modelOptions.options` (this matches +Ollama's own request shape) — `temperature`, `top_p`, and `num_predict` +(max output tokens) all live there. ```typescript import { chat } from '@tanstack/ai' @@ -48,7 +51,13 @@ import { ollamaText } from '@tanstack/ai-ollama' const stream = chat({ adapter: ollamaText('llama3.3'), messages, - temperature: 0.7, + modelOptions: { + options: { + temperature: 0.7, + top_p: 0.9, + num_predict: 1000, // max output tokens + }, + }, // Ollama-specific options are limited compared to cloud providers }) ``` diff --git a/packages/ai/skills/ai-core/adapter-configuration/references/openai-adapter.md b/packages/ai/skills/ai-core/adapter-configuration/references/openai-adapter.md index 87a4d793d..8a092820a 100644 --- a/packages/ai/skills/ai-core/adapter-configuration/references/openai-adapter.md +++ b/packages/ai/skills/ai-core/adapter-configuration/references/openai-adapter.md @@ -44,6 +44,10 @@ chat({ adapter: openaiText('gpt-5.4'), messages, modelOptions: { + // Sampling + temperature: 0.7, + top_p: 0.9, + max_output_tokens: 1000, // Reasoning (effort levels: none, minimal, low, medium, high) reasoning: { effort: 'high', diff --git a/packages/ai/skills/ai-core/chat-experience/SKILL.md b/packages/ai/skills/ai-core/chat-experience/SKILL.md index 918fdec1b..23bb293cf 100644 --- a/packages/ai/skills/ai-core/chat-experience/SKILL.md +++ b/packages/ai/skills/ai-core/chat-experience/SKILL.md @@ -137,8 +137,10 @@ import { anthropicText } from '@tanstack/ai-anthropic' const stream = chat({ adapter: anthropicText('claude-sonnet-4-5'), messages, - temperature: 0.7, - maxTokens: 2000, + modelOptions: { + temperature: 0.7, + max_tokens: 2000, // Anthropic-native key + }, systemPrompts: ['You are a helpful assistant.'], abortController, }) @@ -377,17 +379,32 @@ chat({ adapter: openaiText('gpt-5.2'), messages }) The model is passed to the adapter factory, not to `chat()`. -### f. HIGH: Nesting temperature/maxTokens in options object +### f. HIGH: Passing sampling options at the root of chat() + +Sampling options (`temperature`, token limits, `top_p`/`topP`) are **not** +top-level fields on `chat()`. They live inside `modelOptions` using the +provider's native key. ```typescript -// WRONG +// WRONG — temperature/maxTokens are not root options +chat({ adapter, messages, temperature: 0.7, maxTokens: 1000 }) + +// WRONG — there is no `options` field either chat({ adapter, messages, options: { temperature: 0.7, maxTokens: 1000 } }) -// CORRECT -chat({ adapter, messages, temperature: 0.7, maxTokens: 1000 }) +// CORRECT — inside modelOptions, provider-native keys (OpenAI shown) +chat({ + adapter, + messages, + modelOptions: { temperature: 0.7, max_output_tokens: 1000 }, +}) ``` -All parameters are top-level on the `chat()` options object. +`temperature` is universal across providers; token limits use provider-native +keys (`max_output_tokens` for OpenAI, `max_tokens` for Anthropic/Grok, +`maxOutputTokens` for Gemini, `max_completion_tokens` for Groq, +`maxCompletionTokens` for OpenRouter, and `num_predict` nested under +`modelOptions.options` for Ollama). See ai-core/adapter-configuration/SKILL.md. ### g. HIGH: Using providerOptions instead of modelOptions diff --git a/packages/ai/skills/ai-core/middleware/SKILL.md b/packages/ai/skills/ai-core/middleware/SKILL.md index da1778905..f5884a3d3 100644 --- a/packages/ai/skills/ai-core/middleware/SKILL.md +++ b/packages/ai/skills/ai-core/middleware/SKILL.md @@ -68,6 +68,13 @@ Every hook receives a `ChatMiddlewareContext` as its first argument, which provi Terminal hooks (`onFinish`, `onAbort`, `onError`) are **mutually exclusive** -- exactly one fires per `chat()` invocation. +> **Sampling in `onConfig`:** `temperature`, `topP`, and `maxTokens` are **not** +> first-class fields on `ChatMiddlewareConfig`. To adjust sampling from +> middleware, return a partial that mutates `config.modelOptions` using the +> provider's native key (e.g. OpenAI `temperature` / `max_output_tokens`, +> Anthropic `max_tokens`, Ollama nested `options.num_predict`). Returning a +> top-level `temperature`/`maxTokens` has no effect. + ### Phase values `ctx.phase` is one of: @@ -304,6 +311,10 @@ const configTransform: ChatMiddleware = { if (ctx.phase === 'init') { return { systemPrompts: [...config.systemPrompts, 'Always respond in JSON.'], + // Sampling options are NOT first-class config fields — mutate them + // through `config.modelOptions` using the provider's native key. + // (e.g. OpenAI `temperature` / `max_output_tokens`.) + modelOptions: { ...config.modelOptions, temperature: 0.2 }, } } }, diff --git a/packages/ai/src/activities/chat/index.ts b/packages/ai/src/activities/chat/index.ts index 3962c0f1d..d8922ff72 100644 --- a/packages/ai/src/activities/chat/index.ts +++ b/packages/ai/src/activities/chat/index.ts @@ -134,12 +134,6 @@ export interface TextActivityOptions< | ProviderTool > | undefined - /** Controls the randomness of the output. Higher values make output more random. Range: [0.0, 2.0] */ - temperature?: TextOptions['temperature'] - /** Nucleus sampling parameter. The model considers tokens with topP probability mass. */ - topP?: TextOptions['topP'] - /** The maximum number of tokens to generate in the response. */ - maxTokens?: TextOptions['maxTokens'] /** Additional metadata to attach to the request. */ metadata?: TextOptions['metadata'] /** Model-specific provider options (type comes from adapter) */ @@ -733,13 +727,10 @@ class TextEngine< private beforeRun(): void { this.streamStartTime = Date.now() - const { tools, temperature, topP, maxTokens, metadata } = this.params + const { tools, metadata } = this.params // Gather flattened options into an object for context const options: Record = {} - if (temperature !== undefined) options.temperature = temperature - if (topP !== undefined) options.topP = topP - if (maxTokens !== undefined) options.maxTokens = maxTokens if (metadata !== undefined) options.metadata = metadata this.eventOptions = Object.keys(options).length > 0 ? options : undefined @@ -786,7 +777,7 @@ class TextEngine< } private async *streamModelResponse(): AsyncGenerator { - const { temperature, topP, maxTokens, metadata, modelOptions } = this.params + const { metadata, modelOptions } = this.params const tools = this.tools // Convert tool schemas to JSON Schema before passing to adapter @@ -830,9 +821,6 @@ class TextEngine< model: this.params.model, messages: this.messages, tools: toolsWithJsonSchemas, - temperature, - topP, - maxTokens, metadata, request: this.effectiveRequest, modelOptions, @@ -1737,9 +1725,6 @@ class TextEngine< chatOptions: { model: this.params.model, messages: this.messages, - temperature: postOnConfig.temperature, - topP: postOnConfig.topP, - maxTokens: postOnConfig.maxTokens, metadata: postOnConfig.metadata, modelOptions: postOnConfig.modelOptions, systemPrompts: postOnConfig.systemPrompts, @@ -2219,9 +2204,6 @@ class TextEngine< messages: this.messages, systemPrompts: [...this.systemPrompts], tools: [...this.tools], - temperature: this.params.temperature, - topP: this.params.topP, - maxTokens: this.params.maxTokens, metadata: this.params.metadata, modelOptions: this.params.modelOptions, } @@ -2233,9 +2215,6 @@ class TextEngine< this.tools = config.tools this.params = { ...this.params, - temperature: config.temperature, - topP: config.topP, - maxTokens: config.maxTokens, metadata: config.metadata, modelOptions: config.modelOptions, } diff --git a/packages/ai/src/activities/chat/middleware/types.ts b/packages/ai/src/activities/chat/middleware/types.ts index 8e507bd70..d6760bcdf 100644 --- a/packages/ai/src/activities/chat/middleware/types.ts +++ b/packages/ai/src/activities/chat/middleware/types.ts @@ -90,7 +90,7 @@ export interface ChatMiddlewareContext { systemPrompts: Array /** Names of configured tools, if any */ toolNames?: Array - /** Flattened generation options (temperature, topP, maxTokens, metadata) */ + /** Flattened generation options (metadata) */ options?: Record | undefined /** Provider-specific model options */ modelOptions?: Record | undefined @@ -130,9 +130,6 @@ export interface ChatMiddlewareConfig { messages: Array systemPrompts: Array tools: Array - temperature?: number - topP?: number - maxTokens?: number metadata?: Record | undefined modelOptions?: Record | undefined } diff --git a/packages/ai/src/activities/summarize/chat-stream-summarize.ts b/packages/ai/src/activities/summarize/chat-stream-summarize.ts index 1f4cbf457..2c7904c06 100644 --- a/packages/ai/src/activities/summarize/chat-stream-summarize.ts +++ b/packages/ai/src/activities/summarize/chat-stream-summarize.ts @@ -23,6 +23,82 @@ export interface ChatStreamCapable { chatStream: (options: TextOptions) => AsyncIterable } +/** + * Provider-native max-output-tokens key per text-adapter `name`. summarize is + * provider-agnostic and forwards `modelOptions` opaquely to the wrapped text + * adapter, so `maxLength` must be written under the exact key the underlying + * provider reads — no adapter reads a generic `maxTokens`. A value of `null` + * marks a nested shape (handled specially below for Ollama). + * + * Keep in sync with each adapter's wire mapping: + * - OpenAI (Responses): `max_output_tokens` + * - Anthropic / Grok: `max_tokens` + * - Groq: `max_completion_tokens` + * - Gemini: `maxOutputTokens` + * - OpenRouter: `maxCompletionTokens` + * - Ollama: nested `options.num_predict` + */ +const MAX_TOKENS_KEY_BY_ADAPTER: Record = { + openai: 'max_output_tokens', + anthropic: 'max_tokens', + grok: 'max_tokens', + groq: 'max_completion_tokens', + gemini: 'maxOutputTokens', + openrouter: 'maxCompletionTokens', +} + +/** + * Every flat key any supported provider uses to cap output tokens, plus the + * camelCase variants. Used to detect a caller-supplied token limit so the + * summarize default never overrides an explicit caller value. + */ +const KNOWN_MAX_TOKENS_KEYS = [ + 'max_output_tokens', + 'max_tokens', + 'max_completion_tokens', + 'maxOutputTokens', + 'maxCompletionTokens', + 'maxTokens', +] as const + +/** + * Resolve `maxLength` to the provider-native max-output-tokens key for the + * given text-adapter `name` and merge it into a working copy of the caller's + * `modelOptions`. The caller always wins: if they already set any recognised + * token-limit key (flat or, for Ollama, nested `options.num_predict`), the + * default is left untouched. Unknown/unrecognised adapter names fall back to + * NOT setting a token key (the prompt hint still asks the model to stay under + * `maxLength`) rather than writing a dead key no provider reads. + */ +function applyMaxLength( + adapterName: string, + maxLength: number, + modelOptions: Record, +): Record { + const merged: Record = { ...modelOptions } + + if (adapterName === 'ollama') { + const existing = + merged.options && typeof merged.options === 'object' + ? (merged.options as Record) + : undefined + if (existing && typeof existing.num_predict === 'number') return merged + merged.options = { num_predict: maxLength, ...existing } + return merged + } + + const key = MAX_TOKENS_KEY_BY_ADAPTER[adapterName] + if (key === undefined) return merged + + const callerSetLimit = KNOWN_MAX_TOKENS_KEYS.some( + (k) => typeof merged[k] === 'number', + ) + if (callerSetLimit) return merged + + merged[key] = maxLength + return merged +} + /** * Extract the per-model `modelOptions` type a text adapter accepts. Used by * provider summarize factories so their `modelOptions` IntelliSense matches @@ -195,13 +271,27 @@ export class ChatStreamSummarizeAdapter< options: SummarizationOptions, systemPrompt: string, ): TextOptions { + // Sampling knobs now live in provider-native `modelOptions`. Apply the + // low-temperature default underneath any caller-supplied `modelOptions` so + // callers can still override it. + let working: Record = { + temperature: 0.3, + ...(options.modelOptions as Record | undefined), + } + // `maxLength` must reach the wire under the wrapped adapter's provider- + // native token key (it differs per provider, and no adapter reads a + // generic `maxTokens`). Resolve it from the text adapter's `name`, never + // overriding a caller-supplied token limit. + if (options.maxLength !== undefined) { + working = applyMaxLength(this.name, options.maxLength, working) + } + const modelOptions = working as TProviderOptions + return { model: options.model, messages: [{ role: 'user', content: options.text }], systemPrompts: [systemPrompt], - maxTokens: options.maxLength, - temperature: 0.3, - modelOptions: options.modelOptions, + modelOptions, logger: options.logger, } } diff --git a/packages/ai/src/middlewares/otel.ts b/packages/ai/src/middlewares/otel.ts index 16240a5b4..4bf749884 100644 --- a/packages/ai/src/middlewares/otel.ts +++ b/packages/ai/src/middlewares/otel.ts @@ -162,6 +162,17 @@ function messageEventName(role: string): string { } } +/** + * Return the first candidate that is a finite `number`, or `undefined`. Used to + * pick a sampling attribute from among the several provider-native spellings. + */ +function firstNumber(...candidates: Array): number | undefined { + for (const candidate of candidates) { + if (typeof candidate === 'number') return candidate + } + return undefined +} + function errorMessage(err: unknown): string | undefined { if (err instanceof Error) return err.message if (typeof err === 'string') return err @@ -333,12 +344,39 @@ export function otelMiddleware(options: OtelMiddlewareOptions): ChatMiddleware { 'gen_ai.request.model': ctx.model, 'tanstack.ai.iteration': ctx.iteration, } - if (config.temperature !== undefined) - baseAttrs['gen_ai.request.temperature'] = config.temperature - if (config.topP !== undefined) - baseAttrs['gen_ai.request.top_p'] = config.topP - if (config.maxTokens !== undefined) - baseAttrs['gen_ai.request.max_tokens'] = config.maxTokens + // Sampling options now live in provider-native `modelOptions`, and + // providers spell them differently (e.g. `max_output_tokens`, + // `max_completion_tokens`, `maxOutputTokens`, `num_predict`). Read the + // first numeric value among the known spellings — including Ollama's + // nested `options` — so gen_ai attributes populate across providers. + const sampling = config.modelOptions ?? {} + const nestedOptions = + sampling['options'] && typeof sampling['options'] === 'object' + ? (sampling['options'] as Record) + : undefined + const samplingTemperature = firstNumber( + sampling['temperature'], + nestedOptions?.['temperature'], + ) + const samplingTopP = firstNumber( + sampling['top_p'], + sampling['topP'], + nestedOptions?.['top_p'], + ) + const samplingMaxTokens = firstNumber( + sampling['max_tokens'], + sampling['max_output_tokens'], + sampling['maxOutputTokens'], + sampling['max_completion_tokens'], + sampling['maxCompletionTokens'], + nestedOptions?.['num_predict'], + ) + if (samplingTemperature !== undefined) + baseAttrs['gen_ai.request.temperature'] = samplingTemperature + if (samplingTopP !== undefined) + baseAttrs['gen_ai.request.top_p'] = samplingTopP + if (samplingMaxTokens !== undefined) + baseAttrs['gen_ai.request.max_tokens'] = samplingMaxTokens const baseOptions: SpanOptions = { kind: SpanKind.CLIENT, diff --git a/packages/ai/src/types.ts b/packages/ai/src/types.ts index c04adfa84..49e28ab80 100644 --- a/packages/ai/src/types.ts +++ b/packages/ai/src/types.ts @@ -749,41 +749,6 @@ export interface TextOptions< */ systemPrompts?: Array agentLoopStrategy?: AgentLoopStrategy - /** - * Controls the randomness of the output. - * Higher values (e.g., 0.8) make output more random, lower values (e.g., 0.2) make it more focused and deterministic. - * Range: [0.0, 2.0] - * - * Note: Generally recommended to use either temperature or topP, but not both. - * - * Provider usage: - * - OpenAI: `temperature` (number) - in text.top_p field - * - Anthropic: `temperature` (number) - ranges from 0.0 to 1.0, default 1.0 - * - Gemini: `generationConfig.temperature` (number) - ranges from 0.0 to 2.0 - */ - temperature?: number - /** - * Nucleus sampling parameter. An alternative to temperature sampling. - * The model considers the results of tokens with topP probability mass. - * For example, 0.1 means only tokens comprising the top 10% probability mass are considered. - * - * Note: Generally recommended to use either temperature or topP, but not both. - * - * Provider usage: - * - OpenAI: `text.top_p` (number) - * - Anthropic: `top_p` (number | null) - * - Gemini: `generationConfig.topP` (number) - */ - topP?: number - /** - * The maximum number of tokens to generate in the response. - * - * Provider usage: - * - OpenAI: `max_output_tokens` (number) - includes visible output and reasoning tokens - * - Anthropic: `max_tokens` (number, required) - range x >= 1 - * - Gemini: `generationConfig.maxOutputTokens` (number) - */ - maxTokens?: number /** * Additional metadata to attach to the request. * Can be used for tracking, debugging, or passing custom information. diff --git a/packages/ai/tests/chat.test.ts b/packages/ai/tests/chat.test.ts index bd338666b..cc37b6eb1 100644 --- a/packages/ai/tests/chat.test.ts +++ b/packages/ai/tests/chat.test.ts @@ -97,7 +97,7 @@ describe('chat()', () => { expect(calls[0]!.systemPrompts).toEqual(['You are a helpful assistant']) }) - it('should pass temperature, topP, maxTokens to the adapter', async () => { + it('should pass sampling modelOptions (temperature, topP, maxTokens) to the adapter', async () => { const { adapter, calls } = createMockAdapter({ iterations: [[ev.runStarted(), ev.runFinished('stop')]], }) @@ -105,16 +105,20 @@ describe('chat()', () => { const stream = chat({ adapter, messages: [{ role: 'user', content: 'Hello' }], - temperature: 0.5, - topP: 0.9, - maxTokens: 100, + modelOptions: { + temperature: 0.5, + topP: 0.9, + maxTokens: 100, + }, }) await collectChunks(stream as AsyncIterable) - expect(calls[0]!.temperature).toBe(0.5) - expect(calls[0]!.topP).toBe(0.9) - expect(calls[0]!.maxTokens).toBe(100) + expect(calls[0]!.modelOptions).toMatchObject({ + temperature: 0.5, + topP: 0.9, + maxTokens: 100, + }) }) }) @@ -1273,11 +1277,11 @@ describe('chat()', () => { const options = createChatOptions({ adapter, messages: [{ role: 'user', content: 'Hello' }], - temperature: 0.7, + modelOptions: { temperature: 0.7 }, }) expect(options.adapter).toBe(adapter) - expect(options.temperature).toBe(0.7) + expect(options.modelOptions).toEqual({ temperature: 0.7 }) expect(options.messages).toEqual([{ role: 'user', content: 'Hello' }]) }) }) diff --git a/packages/ai/tests/middleware.test.ts b/packages/ai/tests/middleware.test.ts index 858c8320a..03b792f12 100644 --- a/packages/ai/tests/middleware.test.ts +++ b/packages/ai/tests/middleware.test.ts @@ -258,7 +258,7 @@ describe('chat() middleware', () => { const mw1: ChatMiddleware = { name: 'first', onConfig: () => ({ - maxTokens: 100, + modelOptions: { maxTokens: 100 }, }), } @@ -266,7 +266,10 @@ describe('chat() middleware', () => { name: 'second', onConfig: (_ctx, config) => ({ // Can see what first middleware set - maxTokens: (config.maxTokens ?? 0) + 50, + modelOptions: { + ...config.modelOptions, + maxTokens: ((config.modelOptions?.maxTokens as number) ?? 0) + 50, + }, }), } @@ -278,7 +281,7 @@ describe('chat() middleware', () => { await collectChunks(stream as AsyncIterable) // Adapter should get maxTokens = 150 (100 + 50) - expect(calls[0]!.maxTokens).toBe(150) + expect((calls[0]!.modelOptions as any).maxTokens).toBe(150) }) }) @@ -1586,10 +1589,10 @@ describe('chat() middleware', () => { }) // ========================================================================== - // onConfig transforms temperature/topP/maxTokens + // onConfig transforms sampling options via modelOptions // ========================================================================== describe('onConfig parameter transforms', () => { - it('should allow middleware to transform temperature, topP, and maxTokens', async () => { + it('should allow middleware to transform sampling options via modelOptions', async () => { const { adapter, calls } = createMockAdapter({ iterations: [ [ev.runStarted(), ev.textContent('hi'), ev.runFinished('stop')], @@ -1599,23 +1602,27 @@ describe('chat() middleware', () => { const middleware: ChatMiddleware = { name: 'param-override', onConfig: () => ({ - temperature: 0.9, - topP: 0.8, - maxTokens: 500, + modelOptions: { + temperature: 0.9, + topP: 0.8, + maxTokens: 500, + }, }), } const stream = chat({ adapter, messages: [{ role: 'user', content: 'Hi' }], - temperature: 0.1, + modelOptions: { temperature: 0.1 }, middleware: [middleware], }) await collectChunks(stream as AsyncIterable) - expect(calls[0]!.temperature).toBe(0.9) - expect(calls[0]!.topP).toBe(0.8) - expect(calls[0]!.maxTokens).toBe(500) + expect(calls[0]!.modelOptions).toMatchObject({ + temperature: 0.9, + topP: 0.8, + maxTokens: 500, + }) }) }) @@ -1989,14 +1996,14 @@ describe('chat() middleware', () => { adapter, messages: [{ role: 'user', content: 'Hi' }], systemPrompts: ['Be helpful'], - temperature: 0.5, + modelOptions: { temperature: 0.5 }, middleware: [middleware], }) await collectChunks(stream as AsyncIterable) // Original config should reach the adapter untouched expect(calls[0]!.systemPrompts).toEqual(['Be helpful']) - expect(calls[0]!.temperature).toBe(0.5) + expect((calls[0]!.modelOptions as any).temperature).toBe(0.5) }) }) @@ -2525,21 +2532,19 @@ describe('chat() middleware', () => { if (ctx.phase === 'init') { return { systemPrompts: ['init-prompt'], - temperature: 0.1, + modelOptions: { temperature: 0.1 }, } } if (ctx.phase === 'beforeModel' && ctx.iteration === 0) { return { systemPrompts: ['iter-0-prompt'], - temperature: 0.5, - maxTokens: 100, + modelOptions: { temperature: 0.5, maxTokens: 100 }, } } if (ctx.phase === 'beforeModel' && ctx.iteration === 1) { return { systemPrompts: ['iter-1-prompt'], - temperature: 0.9, - maxTokens: 200, + modelOptions: { temperature: 0.9, maxTokens: 200 }, } } return undefined @@ -2556,13 +2561,17 @@ describe('chat() middleware', () => { // Iteration 0: adapter receives iter-0 config (overrides init) expect(calls[0]!.systemPrompts).toEqual(['iter-0-prompt']) - expect(calls[0]!.temperature).toBe(0.5) - expect(calls[0]!.maxTokens).toBe(100) + expect(calls[0]!.modelOptions).toMatchObject({ + temperature: 0.5, + maxTokens: 100, + }) // Iteration 1: adapter receives iter-1 config expect(calls[1]!.systemPrompts).toEqual(['iter-1-prompt']) - expect(calls[1]!.temperature).toBe(0.9) - expect(calls[1]!.maxTokens).toBe(200) + expect(calls[1]!.modelOptions).toMatchObject({ + temperature: 0.9, + maxTokens: 200, + }) }) it('should accumulate config changes across multiple middleware per iteration', async () => { @@ -2603,7 +2612,11 @@ describe('chat() middleware', () => { onConfig: (ctx, config) => { if (ctx.phase === 'beforeModel' && ctx.iteration === 1) { return { - maxTokens: (config.maxTokens ?? 100) * 2, + modelOptions: { + ...config.modelOptions, + maxTokens: + ((config.modelOptions?.maxTokens as number) ?? 100) * 2, + }, } } return undefined @@ -2615,19 +2628,19 @@ describe('chat() middleware', () => { messages: [{ role: 'user', content: 'Hi' }], tools: [tool], systemPrompts: ['base'], - maxTokens: 100, + modelOptions: { maxTokens: 100 }, middleware: [mw1, mw2], }) await collectChunks(stream as AsyncIterable) // Iteration 0: mw1 adds prompt, mw2 does nothing expect(calls[0]!.systemPrompts).toEqual(['base', 'added-by-mw1-iter-0']) - expect(calls[0]!.maxTokens).toBe(100) + expect((calls[0]!.modelOptions as any).maxTokens).toBe(100) // Iteration 1: mw1 adds prompt, mw2 doubles maxTokens // Note: mw1's change from iter-0 persists since applyMiddlewareConfig updates the engine expect(calls[1]!.systemPrompts).toContain('added-by-mw1-iter-1') - expect(calls[1]!.maxTokens).toBe(200) + expect((calls[1]!.modelOptions as any).maxTokens).toBe(200) }) it('should let middleware observe config changes from the previous iteration', async () => { @@ -2660,14 +2673,18 @@ describe('chat() middleware', () => { configSnapshots.push({ phase: ctx.phase, iteration: ctx.iteration, - maxTokens: config.maxTokens, + maxTokens: config.modelOptions?.maxTokens as number | undefined, systemPrompts: [...config.systemPrompts], }) // On each beforeModel call, bump maxTokens by 50 if (ctx.phase === 'beforeModel') { return { - maxTokens: (config.maxTokens ?? 0) + 50, + modelOptions: { + ...config.modelOptions, + maxTokens: + ((config.modelOptions?.maxTokens as number) ?? 0) + 50, + }, } } return undefined @@ -2678,7 +2695,7 @@ describe('chat() middleware', () => { adapter, messages: [{ role: 'user', content: 'Hi' }], tools: [tool], - maxTokens: 100, + modelOptions: { maxTokens: 100 }, middleware: [middleware], }) await collectChunks(stream as AsyncIterable) diff --git a/packages/ai/tests/middlewares/otel.test.ts b/packages/ai/tests/middlewares/otel.test.ts index 64c04f716..95bc3d6bf 100644 --- a/packages/ai/tests/middlewares/otel.test.ts +++ b/packages/ai/tests/middlewares/otel.test.ts @@ -70,9 +70,9 @@ describe('otelMiddleware — iteration span lifecycle', () => { await runToIterationStart(mw, ctx, { messages: [{ role: 'user', content: 'hi' }], - temperature: 0.7, - topP: 0.9, - maxTokens: 512, + // Provider-native spellings: OpenAI Responses uses snake_case `top_p` + // and `max_output_tokens`, not the camelCase `topP` / `maxTokens`. + modelOptions: { temperature: 0.7, top_p: 0.9, max_output_tokens: 512 }, }) const [rootSpan, iterSpan] = spans @@ -81,6 +81,12 @@ describe('otelMiddleware — iteration span lifecycle', () => { expect(iterSpan!.name).toBe('chat gpt-4o #0') expect(iterSpan!.kind).toBe(SpanKind.CLIENT) expect(iterSpan!.ended).toBe(false) + // Sampling options are sourced from provider-native modelOptions, whose + // key spellings vary per provider. The middleware reads a union of known + // spellings so the gen_ai semantic attributes populate regardless. + expect(iterSpan!.attributes['gen_ai.request.temperature']).toBe(0.7) + expect(iterSpan!.attributes['gen_ai.request.top_p']).toBe(0.9) + expect(iterSpan!.attributes['gen_ai.request.max_tokens']).toBe(512) await mw.onChunk?.(ctx, { ...ev.runFinished('stop'), model: 'gpt-4o' }) // The iteration span stays open across RUN_FINISHED so tool spans can @@ -100,6 +106,45 @@ describe('otelMiddleware — iteration span lifecycle', () => { expect(rootSpan!.ended).toBe(true) }) + it('reads sampling attributes from Ollama-nested modelOptions.options', async () => { + const { tracer, spans } = createFakeTracer() + const mw = otelMiddleware({ tracer }) + const ctx = makeCtx() + ctx.phase = 'init' + + await runToIterationStart(mw, ctx, { + messages: [{ role: 'user', content: 'hi' }], + // Ollama nests sampling under `options` and caps output via `num_predict`. + modelOptions: { + options: { temperature: 0.2, top_p: 0.8, num_predict: 256 }, + }, + }) + + const iterSpan = spans[1] + expect(iterSpan!.attributes['gen_ai.request.temperature']).toBe(0.2) + expect(iterSpan!.attributes['gen_ai.request.top_p']).toBe(0.8) + expect(iterSpan!.attributes['gen_ai.request.max_tokens']).toBe(256) + }) + + it('reads camelCase sampling spellings (Gemini/OpenRouter)', async () => { + const { tracer, spans } = createFakeTracer() + const mw = otelMiddleware({ tracer }) + const ctx = makeCtx() + ctx.phase = 'init' + + await runToIterationStart(mw, ctx, { + messages: [{ role: 'user', content: 'hi' }], + // Gemini uses `topP` / `maxOutputTokens`; OpenRouter uses + // `maxCompletionTokens`. + modelOptions: { temperature: 0.5, topP: 0.95, maxOutputTokens: 1024 }, + }) + + const iterSpan = spans[1] + expect(iterSpan!.attributes['gen_ai.request.temperature']).toBe(0.5) + expect(iterSpan!.attributes['gen_ai.request.top_p']).toBe(0.95) + expect(iterSpan!.attributes['gen_ai.request.max_tokens']).toBe(1024) + }) + it('opens a fresh iteration span for each onConfig(beforeModel) and closes the previous one', async () => { const { tracer, spans } = createFakeTracer() const mw = otelMiddleware({ tracer }) diff --git a/packages/ai/tests/summarize-max-length.test.ts b/packages/ai/tests/summarize-max-length.test.ts new file mode 100644 index 000000000..00be5cbd9 --- /dev/null +++ b/packages/ai/tests/summarize-max-length.test.ts @@ -0,0 +1,139 @@ +import { describe, it, expect } from 'vitest' +import { ChatStreamSummarizeAdapter } from '../src/activities/summarize/chat-stream-summarize' +import { resolveDebugOption } from '../src/logger/resolve' +import { ev } from './test-utils' +import type { ChatStreamCapable } from '../src/activities/summarize/chat-stream-summarize' +import type { StreamChunk, TextOptions } from '../src/types' + +const logger = resolveDebugOption(false) + +/** + * Fake text adapter that records the `modelOptions` it is handed and yields a + * trivial summary stream. `name` is irrelevant here — the summarize wrapper + * keys off its OWN `name` (set via the constructor) to resolve the provider's + * max-tokens spelling. + */ +function createRecordingTextAdapter(): { + textAdapter: ChatStreamCapable + lastModelOptions: () => Record | undefined +} { + let recorded: Record | undefined + const textAdapter: ChatStreamCapable = { + chatStream(opts: TextOptions): AsyncIterable { + recorded = opts.modelOptions as Record | undefined + return (async function* () { + yield ev.textContent('summary') + yield ev.runFinished('stop') + })() + }, + } + return { textAdapter, lastModelOptions: () => recorded } +} + +describe('ChatStreamSummarizeAdapter — maxLength reaches the wrapped adapter under the native key', () => { + it('OpenAI-style adapter receives maxLength as max_output_tokens', async () => { + const { textAdapter, lastModelOptions } = createRecordingTextAdapter() + const adapter = new ChatStreamSummarizeAdapter( + textAdapter, + 'gpt-4o-mini', + 'openai', + ) + + await adapter.summarize({ + model: 'gpt-4o-mini', + text: 'hi', + maxLength: 128, + logger, + }) + + const opts = lastModelOptions() + expect(opts?.['max_output_tokens']).toBe(128) + // Generic / dead keys must NOT be set. + expect(opts?.['maxTokens']).toBeUndefined() + expect(opts?.['max_tokens']).toBeUndefined() + // Temperature default still applied. + expect(opts?.['temperature']).toBe(0.3) + }) + + it('Anthropic adapter receives maxLength as max_tokens', async () => { + const { textAdapter, lastModelOptions } = createRecordingTextAdapter() + const adapter = new ChatStreamSummarizeAdapter( + textAdapter, + 'claude-sonnet-4-5', + 'anthropic', + ) + + await adapter.summarize({ + model: 'claude-sonnet-4-5', + text: 'hi', + maxLength: 200, + logger, + }) + + const opts = lastModelOptions() + expect(opts?.['max_tokens']).toBe(200) + expect(opts?.['maxTokens']).toBeUndefined() + }) + + it('Ollama adapter receives maxLength nested under options.num_predict', async () => { + const { textAdapter, lastModelOptions } = createRecordingTextAdapter() + const adapter = new ChatStreamSummarizeAdapter( + textAdapter, + 'mistral', + 'ollama', + ) + + await adapter.summarize({ + model: 'mistral', + text: 'hi', + maxLength: 64, + logger, + }) + + const opts = lastModelOptions() + const nested = opts?.['options'] as Record | undefined + expect(nested?.['num_predict']).toBe(64) + }) + + it('does not override a caller-supplied token limit', async () => { + const { textAdapter, lastModelOptions } = createRecordingTextAdapter() + const adapter = new ChatStreamSummarizeAdapter( + textAdapter, + 'gpt-4o-mini', + 'openai', + ) + + await adapter.summarize({ + model: 'gpt-4o-mini', + text: 'hi', + maxLength: 128, + // Caller explicitly sets a token cap — summarize must not clobber it. + modelOptions: { max_output_tokens: 999, temperature: 0.9 }, + logger, + }) + + const opts = lastModelOptions() + expect(opts?.['max_output_tokens']).toBe(999) + // Caller temperature also wins over the 0.3 default. + expect(opts?.['temperature']).toBe(0.9) + }) + + it('unknown adapter name sets no token key (falls back to prompt hint only)', async () => { + const { textAdapter, lastModelOptions } = createRecordingTextAdapter() + // Default name is 'chat-stream-summarize' — not a recognised provider. + const adapter = new ChatStreamSummarizeAdapter(textAdapter, 'some-model') + + await adapter.summarize({ + model: 'some-model', + text: 'hi', + maxLength: 128, + logger, + }) + + const opts = lastModelOptions() + expect(opts?.['max_output_tokens']).toBeUndefined() + expect(opts?.['max_tokens']).toBeUndefined() + expect(opts?.['maxTokens']).toBeUndefined() + expect(opts?.['options']).toBeUndefined() + }) +}) diff --git a/packages/openai-base/src/adapters/chat-completions-text.ts b/packages/openai-base/src/adapters/chat-completions-text.ts index 064f92b9d..7665932db 100644 --- a/packages/openai-base/src/adapters/chat-completions-text.ts +++ b/packages/openai-base/src/adapters/chat-completions-text.ts @@ -1151,22 +1151,14 @@ export abstract class OpenAIBaseChatCompletionsTextAdapter< } : undefined - // Build the request so explicit top-level options win over modelOptions - // when set, but `undefined` top-level options do NOT clobber values the - // caller put in modelOptions. Keeping the merge nullish-aware fixes the - // silent regression where a `modelOptions: { temperature: 0.7 }` setting - // was overwritten with `temperature: undefined`. + // `modelOptions` is the sole sampling surface: callers set provider-native + // wire names (`temperature`, `top_p`, `max_tokens`/`max_completion_tokens`) + // there and they flow through the spread below. The root + // `temperature`/`topP`/`maxTokens` fields are intentionally NOT read here. return { ...modelOptions, model: options.model, messages, - ...(options.temperature !== undefined && { - temperature: options.temperature, - }), - ...(options.maxTokens !== undefined && { - max_tokens: options.maxTokens, - }), - ...(options.topP !== undefined && { top_p: options.topP }), // Conditional spread: `tools: undefined` would clobber any // modelOptions.tools the caller set above. ...(tools && diff --git a/packages/openai-base/src/adapters/responses-text.ts b/packages/openai-base/src/adapters/responses-text.ts index 43b3d9138..10765c015 100644 --- a/packages/openai-base/src/adapters/responses-text.ts +++ b/packages/openai-base/src/adapters/responses-text.ts @@ -1635,23 +1635,15 @@ export abstract class OpenAIBaseResponsesTextAdapter< } : undefined - // Spread modelOptions first, then explicit top-level options when set. - // Mirrors the chat-completions base adapter's precedence so callers - // tuning either backend get identical behaviour. Leaving `modelOptions` - // last (its previous behavior) silently shadowed the canonical - // `options.temperature`/`maxTokens` fields, while spreading first - // without nullish-aware merge would clobber `modelOptions.temperature` - // with `undefined` whenever the caller didn't set the top-level option. + // `modelOptions` is the sole sampling surface: `temperature`, `top_p`, and + // `max_output_tokens` live there (typed via OpenAISamplingOptions) and are + // spread first. Engine-managed fields (`model`, `metadata`, `instructions`, + // `input`, `tools`, `textFormat`) are layered on top afterward so they + // always win over any same-named key a caller happened to put in + // `modelOptions`. return { ...modelOptions, model: options.model, - ...(options.temperature !== undefined && { - temperature: options.temperature, - }), - ...(options.maxTokens !== undefined && { - max_output_tokens: options.maxTokens, - }), - ...(options.topP !== undefined && { top_p: options.topP }), ...(options.metadata !== undefined && { metadata: options.metadata }), ...(() => { const prompts = normalizeSystemPrompts(options.systemPrompts) diff --git a/packages/openai-base/tests/responses-text.test.ts b/packages/openai-base/tests/responses-text.test.ts index 7430f6947..794413b24 100644 --- a/packages/openai-base/tests/responses-text.test.ts +++ b/packages/openai-base/tests/responses-text.test.ts @@ -1867,9 +1867,13 @@ describe('OpenAIBaseResponsesTextAdapter', () => { logger: testLogger, model: 'test-model', messages: [{ role: 'user', content: 'Hello' }], - temperature: 0.5, - topP: 0.9, - maxTokens: 1024, + // Sampling options now flow through modelOptions with provider-native + // wire names; the base no longer reads root temperature/topP/maxTokens. + modelOptions: { + temperature: 0.5, + top_p: 0.9, + max_output_tokens: 1024, + }, systemPrompts: ['Be helpful'], tools: [weatherTool], })) {