You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat(pdf-server): get_viewer_state interact action
New interact action that returns a JSON snapshot of the live viewer:
{currentPage, pageCount, zoom, displayMode, selectedAnnotationIds,
selection: {text, contextBefore, contextAfter, boundingRect} | null}.
The viewer already pushes selection passively via setModelContext as
<pdf-selection> tags, but not all hosts surface model-context. This gives
the model an explicit pull.
selection.boundingRect is a single bbox in PDF points (top-left origin,
y-down) so it can be fed straight back into add_annotations. selection is
null when nothing is selected or the selection is outside the text-layer.
Wiring: new PdfCommand variant -> processCommands case ->
handleGetViewerState -> submit_viewer_state (new app-only tool, mirrors
submit_save_data) -> waitForViewerState -> text content block.
Also fills a gap in the display_pdf description: it listed interact
actions but was missing save_as; added that and get_viewer_state.
e2e: two tests covering selection:null and a programmatic text-layer
selection.
* test(pdf-server): fix get_viewer_state e2e race + assertions
readLastToolResult clicked .last() before the interact result panel
existed (callInteract doesn't block), so it expanded the display_pdf
panel instead. Wait for the expected panel count first.
Also: basic-host renders the full CallToolResult JSON, with the state
double-escaped inside content[0].text. Parse instead of regex-matching.
playwright.config.ts: honor PW_CHANNEL env to use system Chrome locally
when the bundled chromium_headless_shell is broken.
@@ -2295,6 +2344,7 @@ Example — add a signature image and a stamp, then screenshot to verify:
2295
2344
**TEXT/SCREENSHOTS**:
2296
2345
• get_text: extract text from pages. Optional \`page\` for single page, or \`intervals\` for ranges [{start?,end?}]. Max 20 pages.
2297
2346
• get_screenshot: capture a single page as PNG image. Requires \`page\`.
2347
+
• get_viewer_state: snapshot of the live viewer — JSON {currentPage, pageCount, zoom, displayMode, selectedAnnotationIds, selection:{text,contextBefore,contextAfter,boundingRect}|null}. Use this to read what the user has selected or which page they're on.
2298
2348
2299
2349
**FORMS** — fill_form: fill fields with \`fields\` array of {name, value}.
2300
2350
@@ -2320,6 +2370,7 @@ Example — add a signature image and a stamp, then screenshot to verify:
2320
2370
"fill_form",
2321
2371
"get_text",
2322
2372
"get_screenshot",
2373
+
"get_viewer_state",
2323
2374
"save_as",
2324
2375
])
2325
2376
.optional()
@@ -2603,6 +2654,48 @@ Example — add a signature image and a stamp, then screenshot to verify:
2603
2654
},
2604
2655
);
2605
2656
2657
+
// Tool: submit_viewer_state (app-only) - Viewer reports its live state
2658
+
registerAppTool(
2659
+
server,
2660
+
"submit_viewer_state",
2661
+
{
2662
+
title: "Submit Viewer State",
2663
+
description:
2664
+
"Submit a viewer-state snapshot for a get_viewer_state request (used by viewer). The model should NOT call this tool directly.",
2665
+
inputSchema: {
2666
+
requestId: z
2667
+
.string()
2668
+
.describe("The request ID from the get_viewer_state command"),
2669
+
state: z
2670
+
.string()
2671
+
.optional()
2672
+
.describe("JSON-encoded viewer state snapshot"),
2673
+
error: z
2674
+
.string()
2675
+
.optional()
2676
+
.describe("Error message if the viewer failed to read state"),
0 commit comments