You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: add chatCollect(), API spec cross-reference, v2 scoping
Brainstorm:
- Added chatCollect() for non-streaming programmatic API
- Scoped out vision/multimodal, thinking/budget_tokens, tools/tool_choice
as v2 items with specific rationale
- Added reasoning_effort to v1 scope
- Referenced PRs #166 (agent plugin) and #200 (vector search)
- Updated references with query/vision/reasoning/function-calling docs
Plan:
- Cross-referenced Databricks Query API spec vs OpenAI conventions
- Documented type sourcing decision (hand-write for v1, sourced from
OpenAI API reference)
- Added SDK comparison table (OpenAI vs Anthropic vs AppKit)
- Fixed id: string | null in response types
- Noted served-model-name header for telemetry
- Documented extra_params vs top-level field convention
Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
1.**Type safety hardened** — removed unsafe index signature, added string literal unions for roles, fully specified response types
@@ -31,6 +33,10 @@ deepened: 2026-03-24
31
33
-`reasoning_effort` added to v1 allowlist (GPT-5, Gemini 3.x, GPT OSS reasoning models — simple string enum, zero security risk)
32
34
-`databricks-` prefix check removed from endpoint name validation — Foundation Model API endpoints all use this prefix (e.g., `databricks-claude-sonnet-4-5`)
33
35
- Vision/multimodal, `thinking`/`budget_tokens` (Claude reasoning), and function calling explicitly documented as v2 considerations in Known Limitations
36
+
- Plan types sourced from OpenAI conventions, not the Databricks API spec (which only documents 7 chat params + `extra_params` catch-all)
37
+
-`id` can be `null` in Databricks responses (fixed in types)
38
+
-`served-model-name` response header available for telemetry
39
+
-`extra_params` is the Databricks-blessed pattern for extended params, but top-level fields also work via OpenAI compat layer
34
40
- AppKit has no upstream SSE parser — need to create one for proxy scenarios
35
41
-`SSEWriter.writeEvent()` doesn't handle backpressure (known gap, not blocking for v1)
36
42
- Resource model simplified: one required (chat) + one optional (embedding) — aligns with CLI `apps init` flow and Databricks Apps `valueFrom` pattern
Consider separate pools for streaming (long-lived) vs. non-streaming (short-lived) to prevent head-of-line blocking — this is a v2 optimization. For v1, include a single undici `Agent` with `connections: 100` (configurable via `IServingConfig.connectionPoolSize`). Default `fetch()` only allows ~10 connections per origin, which saturates with just 10 concurrent streaming users — each streaming request holds a TCP connection for the full LLM response duration (30-120s). The 6-line `Agent` config prevents this at near-zero cost (undici idle connection overhead is ~1KB memory). 100 provides headroom for mixed streaming + non-streaming workloads (at 40 concurrent streaming users, 60 remain for embeddings). Add a `// TODO: separate pools for streaming vs non-streaming` comment for v2.
231
237
238
+
### Type Sourcing & API Compatibility
239
+
240
+
The plugin's request/response types follow **OpenAI conventions**, not the Databricks API spec. The [official Databricks spec](https://docs.databricks.com/api/workspace/servingendpoints/query) is a generic endpoint that documents only 7 chat parameters; everything else passes through the OpenAI-compatible layer. Neither the `openai` nor `@anthropic-ai/sdk` packages are in AppKit's dependencies.
241
+
242
+
**Type sourcing options:**
243
+
244
+
| Option | Pros | Cons |
245
+
|--------|------|------|
246
+
|**A. `import type` from `openai`**| Maintained by OpenAI, comprehensive (tools, vision, reasoning, streaming chunks), great autocomplete, zero runtime cost (dev dep only) | Adds a dependency; `openai` is NOT currently in AppKit |
247
+
|**B. Hand-write types (current)**| No new dependency, can restrict to exactly what the allowlist permits | Must be manually maintained, sourced against actual API docs |
248
+
|**C. `@databricks/sdk-experimental`**| Already a dependency | Missing `ChatCompletionChunk`, streaming types, and OpenAI-compatible response shapes |
249
+
250
+
**Decision:** Hand-write types for v1 (Option B), explicitly sourced against the [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat) and [Databricks API spec](https://docs.databricks.com/api/workspace/servingendpoints/query). Revisit Option A if type drift becomes a maintenance burden. The types define AppKit's security boundary (what the proxy accepts), not the full upstream API.
251
+
252
+
**SDK comparison — how AppKit maps to alternative SDKs:**
**Key architectural distinction:** OpenAI/Anthropic SDKs are *clients* — they construct requests for one provider. AppKit is a *server-side proxy* — it receives frontend requests, forwards to Databricks (OpenAI-compatible), and streams back. The types define a security boundary, not a client API.
266
+
267
+
**`served-model-name` response header:** Databricks returns the actual model that served the request in a `served-model-name` response header. Capture this for telemetry spans (e.g., `serving.served_model_name` attribute) — useful when endpoints have traffic splitting across multiple models.
268
+
232
269
### Request Validation
233
270
234
-
Minimal validation with security guardrails:
271
+
Minimal validation with security guardrails.
272
+
273
+
**Note on `extra_params` (from API spec cross-reference):** The [official Databricks API spec](https://docs.databricks.com/api/workspace/servingendpoints/query) only documents 7 chat parameters (`messages`, `max_tokens`, `n`, `stop`, `stream`, `temperature`, `input`) and provides `extra_params` as a catch-all `object` field for "completions, chat, and embeddings" endpoints. Parameters like `top_p`, `model`, `reasoning_effort`, etc. are NOT in the official spec — they pass through the OpenAI-compatible layer. We send them as top-level fields (matching how the OpenAI SDK sends them) rather than using `extra_params`, because the OpenAI compat layer accepts both approaches and top-level is the convention for OpenAI-compatible clients.
-[Databricks Apps: Model Serving integration](https://docs.databricks.com/aws/en/dev-tools/databricks-apps/model-serving)
756
-
-[OpenAI Chat Completions API reference](https://platform.openai.com/docs/api-reference/chat)
796
+
-[Serving Endpoints Query API spec](https://docs.databricks.com/api/workspace/servingendpoints/query) — official spec (7 chat params + `extra_params` catch-all; plan types follow OpenAI format instead)
797
+
-[OpenAI Chat Completions API reference](https://platform.openai.com/docs/api-reference/chat) — the actual source for plan types (OpenAI-compatible format)
0 commit comments