Skip to content

Commit 20e529f

Browse files
mcanouilvezwork
andauthored
fix: protect code cell options from formatting (#655)
* fix(format): strip cell option directives before formatting embedded code Quarto code cells carry metadata on leading comment lines (`#| label: foo`, `#| echo: false`, `//| label: bar`, ...) that must survive verbatim so Quarto can parse them on the next render. Previously, `formatBlock` handed the entire cell to the language formatter (Black, autopep8, styler, ...), which treats these lines as ordinary comments and may reflow, reorder, or rewrite them. Hide the option lines from the formatter entirely by slicing them off `blockLines` before calling `virtualDocForCode`, and shift the returned edits back into the real code range via `block.range.start.line + 1 + optionLines`. Blocks that are entirely option directives short-circuit to `undefined` (no formatter invocation, no out-of-range toast). This replaces the filter-after approach that dropped any edit whose start line landed in the option region, which quietly swallowed valid multi-line edits and block-wide edits starting at virtual-doc line 0. * test(format): cover cell option protection with deterministic fixtures Add a `Code Block Formatting` suite that registers a fake Python formatter from a pure `string -> string` function, so each case can prove exactly what did and did not get rewritten. The fake formatters mangle `=` and `+` into spaced forms, rewrite `#| ` to `# PIPE`, and normalise comment spacing. Cover six scenarios: a single directive followed by code, multiple directives preserved in original order, a block with no directives, a directives-only block (no-op), multiline code with badly-formatted standalone and inline comments, and a hostile formatter run against a mixed block to prove the directives never reach the formatter at all. * docs: note cell option formatting fix in CHANGELOG * fix(format): protect option directives via canonical regex and per-block formatting Close three gaps discovered in adversarial review of the initial fix: 1. The `languageOptionComment` helper used a private map keyed by short language ids (e.g. `python`, `r`, `js`). It didn't know about `typescript`, so `//| ...` directives in `ts` cells were never stripped and still reached the formatter. Use `language.comment` from editor-core instead, which is the canonical, language-wide mapping. 2. The option-line detection used `startsWith("<comment>| ")`, which only matched the canonical form. Quarto's own cell-option parser (`cell/options.ts`) accepts `^<comment>\s*\| ?`, so variants like `#|label`, `# | label`, and `#| label` are legal. Switch to the same canonical regex so every variant is protected. 3. `embeddedDocumentFormattingProvider` bypassed `formatBlock` entirely for languages with `canFormatDocument !== false` (R, Julia, TS, ...), handing the whole virtualised document to the formatter. Option lines inside every block leaked through. Route every target block through `formatBlock` instead, so the same strip-before-format path protects both cell-format and document-format invocations. * test(format): cover whitespace variants, typescript, and document formatting Add three fixtures and three regression tests for the gaps fixed in the previous commit: - `format-python-option-variants.qmd` exercises every whitespace variant the Quarto cell-option parser accepts (`#|label`, `# | label`, `#| label`). The accompanying test runs an aggressive formatter that rewrites any `#\s*\|` line to `# MANGLED`; with the canonical-regex strip in place, no directive ever reaches the formatter, so `# MANGLED` must not appear. - `format-typescript.qmd` covers the `//|` directive form. Before switching to `language.comment` the `ts` language had no entry in the private comment map and the directive was unprotected; this test would have silently allowed the hostile rewrite. - `format-r-multiple-blocks.qmd` drives the document-formatting path (formerly bypassed `formatBlock` for languages with `canFormatDocument !== false`). The test invokes `embeddedDocumentFormattingProvider` directly since the LSP middleware isn't wired up in the vscode-test host, and asserts that every block's directives survive while the code itself is reformatted. The two new helpers `hostileRFormatter` and `hostileTypeScriptFormatter` use the same canonical regex pattern to decide what to mangle, so they are realistic stand-ins for any formatter that treats `#|`/`//|` lines as ordinary comments. * refactor(format): reuse canonical option pattern and harden doc-format path Address adversarial review findings on the option-protection patch: - Reuse `optionCommentPattern` from `cell/options.ts` instead of re-encoding the regex in `format.ts`. The protection path can no longer drift from Quarto's own cell-option parser, and the local `escapeRegExp` helper is gone. - Make document formatting all-or-nothing: aggregate per-block bails into a single message and abort the whole operation rather than apply a partial format. Per-block toasts are silenced via a new `silentOutOfRange` flag on `formatBlock` so the doc path no longer spams a message per failing block. - Thread `defaultOptions` through `embeddedDocumentRangeFormattingProvider` and `FormatCellCommand`, both of which previously fell back to a hardcoded `tabSize: 4 / insertSpaces: true` regardless of the user's editor settings. - Mark `languageOptionComment` as private in `option.ts` since it has no remaining external consumers. * test(format): exercise real edit-apply path and multi-edit aggregation - Replace the direct provider call in the document-formatting test with a real `vscode.executeFormatDocumentProvider` invocation. The test now registers `embeddedDocumentFormattingProvider` against the `quarto` language and lets VS Code's command machinery validate ordering and overlap. Pairwise non-overlap is asserted explicitly so a future regression in offset arithmetic surfaces here. - Add a test that exercises a formatter returning one `TextEdit` per non-empty line, covering the multi-edit aggregation path that all previous hostile formatters skipped. - Wrap both new tests in try/finally so a failed assertion can never leak a registered formatter into a later suite. - Drop the misleading comment about inject lines from `mangleHashPipeLines` and remove an unnecessary `wait(450)` from the document-formatting test. * fix(format): harden edge cases from adversarial review - Treat empty formatter results as no-op (not out-of-range bail) in both formatBlock and FormatCellCommand, preventing spurious error toasts. - Use trim() in the options-only guard so whitespace-only tails after option directives are correctly treated as empty. - Wrap testFormatter disposal in try/finally to prevent leaked formatter registrations on test failure. - Document why block-comment languages are not affected by the option stripping logic. --------- Co-authored-by: Elliot <key.draw@gmail.com>
1 parent e3d9958 commit 20e529f

12 files changed

Lines changed: 782 additions & 36 deletions

apps/vscode/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
- Added support for Positron's statement execution feature that reports the approximate line number of the parse error (<https://github.com/quarto-dev/quarto/pull/919>).
66
- Fixed a bug where `Quarto: Format Cell` would notify you that no formatter was available for code cells that were already formatted (<https://github.com/quarto-dev/quarto/pull/933>).
77
- No longer claim `.typ` files. Typst syntax highlighting in Quarto documents is unaffected, but standalone Typst files are now left to dedicated extensions like Tinymist (<https://github.com/quarto-dev/quarto/pull/943>).
8+
- Preserve Quarto code cell option directives (e.g. `#| label: foo`) when formatting embedded code. The directives are now stripped from the virtual document before being handed to the language formatter, so formatters such as Black, autopep8, and styler can no longer reflow or rewrite them (<https://github.com/quarto-dev/quarto/pull/655>).
89
- Fixed a bug where closing the Quarto Preview terminal via the trash icon did not clean up intermediate `.quarto_ipynb` files (<https://github.com/quarto-dev/quarto/pull/947>).
910

1011
## 1.130.0 (Release on 2026-02-18)

apps/vscode/src/providers/cell/options.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ function langCommentChars(lang: string): string[] {
9191
return chars;
9292
}
9393
}
94-
function optionCommentPattern(comment: string) {
94+
export function optionCommentPattern(comment: string) {
9595
return new RegExp("^" + escapeRegExp(comment) + "\\s*\\| ?");
9696
}
9797

apps/vscode/src/providers/format.ts

Lines changed: 109 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -35,14 +35,15 @@ import { TokenCodeBlock, TokenMath, codeForExecutableLanguageBlock, languageBloc
3535
import { Command } from "../core/command";
3636
import { isQuartoDoc } from "../core/doc";
3737
import { MarkdownEngine } from "../markdown/engine";
38+
import { optionCommentPattern } from "./cell/options";
3839
import { EmbeddedLanguage, languageCanFormatDocument } from "../vdoc/languages";
3940
import {
41+
isBlockOfLanguage,
4042
languageFromBlock,
4143
mainLanguage,
4244
unadjustedRange,
4345
VirtualDoc,
4446
virtualDocForCode,
45-
virtualDocForLanguage,
4647
withVirtualDocUri,
4748
} from "../vdoc/vdoc";
4849

@@ -90,22 +91,42 @@ export function embeddedDocumentFormattingProvider(engine: MarkdownEngine) {
9091
return [];
9192
}
9293

93-
if (languageCanFormatDocument(language)) {
94-
// Full document formatting support
95-
const vdoc = virtualDocForLanguage(document, tokens, language);
96-
return executeFormatDocumentProvider(
97-
vdoc,
98-
document,
99-
formattingOptions(document.uri, vdoc.language, options)
94+
// If the selected language supports whole-document formatting, format
95+
// every block of it; otherwise, format only the cell containing the
96+
// cursor. Either way, each block is routed through `formatBlock` so
97+
// that Quarto option directives are protected by the same
98+
// strip-before-format path and can't leak to the language formatter.
99+
const targetBlocks: (TokenMath | TokenCodeBlock)[] = languageCanFormatDocument(language)
100+
? (tokens.filter(isBlockOfLanguage(language)) as (TokenMath | TokenCodeBlock)[])
101+
: block
102+
? [block]
103+
: [];
104+
105+
// Document formatting is all-or-nothing: if any block fails the
106+
// out-of-range guard, abort the whole operation rather than apply a
107+
// partial format. We pass `silentOutOfRange: true` so per-block
108+
// failures don't toast individually; a single aggregated message is
109+
// shown below.
110+
const allEdits: TextEdit[] = [];
111+
let outOfRangeBlockFailures = 0;
112+
for (const target of targetBlocks) {
113+
const edits = await formatBlock(document, target, options, true);
114+
if (edits === undefined) {
115+
continue;
116+
}
117+
if (edits.length === 0) {
118+
outOfRangeBlockFailures++;
119+
continue;
120+
}
121+
allEdits.push(...edits);
122+
}
123+
if (outOfRangeBlockFailures > 0) {
124+
window.showInformationMessage(
125+
`Formatting edits were out of range in ${outOfRangeBlockFailures} code cell${outOfRangeBlockFailures === 1 ? "" : "s"}; document was not modified.`
100126
);
101-
} else if (block) {
102-
// Just format the selected block if there is one
103-
const edits = await formatBlock(document, block);
104-
return edits ? edits : [];
105-
} else {
106-
// Nothing we can format
107127
return [];
108128
}
129+
return allEdits;
109130
};
110131
}
111132

@@ -145,7 +166,7 @@ export function embeddedDocumentRangeFormattingProvider(
145166
return [];
146167
}
147168

148-
const edits = await formatBlock(document, block);
169+
const edits = await formatBlock(document, block, options);
149170
if (!edits) {
150171
return [];
151172
}
@@ -180,9 +201,15 @@ class FormatCellCommand implements Command {
180201
return;
181202
}
182203

183-
const edits = await formatBlock(document, block);
184-
if (!edits) {
185-
// Nothing to do! Already formatted, or no formatter picked us up, or this language doesn't support formatting.
204+
const editorOptions: FormattingOptions = {
205+
tabSize: typeof editor.options.tabSize === "number" ? editor.options.tabSize : 4,
206+
insertSpaces: typeof editor.options.insertSpaces === "boolean" ? editor.options.insertSpaces : true,
207+
};
208+
const edits = await formatBlock(document, block, editorOptions);
209+
if (!edits || edits.length === 0) {
210+
// Nothing to do! Already formatted, no formatter picked us up, this
211+
// language doesn't support formatting, or the edits were out of range
212+
// (the user already saw a toast from formatBlock in that case).
186213
return;
187214
}
188215

@@ -235,7 +262,12 @@ async function executeFormatDocumentProvider(
235262
}
236263
}
237264

238-
async function formatBlock(doc: TextDocument, block: TokenMath | TokenCodeBlock): Promise<TextEdit[] | undefined> {
265+
async function formatBlock(
266+
doc: TextDocument,
267+
block: TokenMath | TokenCodeBlock,
268+
defaultOptions?: FormattingOptions,
269+
silentOutOfRange: boolean = false
270+
): Promise<TextEdit[] | undefined> {
239271
// Extract language
240272
const language = languageFromBlock(block);
241273
if (!language) {
@@ -247,44 +279,86 @@ async function formatBlock(doc: TextDocument, block: TokenMath | TokenCodeBlock)
247279
return undefined;
248280
}
249281

250-
// Create virtual document containing the block
251282
const blockLines = lines(codeForExecutableLanguageBlock(block, false));
252-
const vdoc = virtualDocForCode(blockLines, language);
283+
284+
// Count leading Quarto option directives (e.g. `#| label: foo`) so we can
285+
// hide them from the formatter entirely. Feeding these lines to formatters
286+
// like Black or styler risks reflowing or rewriting them, which would
287+
// silently break the cell's behaviour on the next render. We reuse
288+
// `optionCommentPattern` from `cell/options.ts` so this code path can
289+
// never drift from Quarto's own cell-option parser: any variant the
290+
// executor recognises (`#| label`, `#|label`, `# | label`, `#| label`,
291+
// ...) is automatically protected here. `language.comment` is the
292+
// canonical comment string from `editor-core` and covers every formatter
293+
// language (including TypeScript, which was missing from the ad-hoc map
294+
// the previous implementation used). Note: block-comment languages (C,
295+
// CSS, SAS) use a tuple comment-char in `cell/options.ts` with a suffix
296+
// check; those languages do not have `canFormat: true` in
297+
// `vdoc/languages.ts`, so they never reach this code path.
298+
const optionPattern = language.comment
299+
? optionCommentPattern(language.comment)
300+
: undefined;
301+
let optionLines = 0;
302+
if (optionPattern) {
303+
for (const line of blockLines) {
304+
if (optionPattern.test(line)) {
305+
optionLines++;
306+
} else {
307+
break;
308+
}
309+
}
310+
}
311+
312+
// Create virtual document containing only the code portion of the block
313+
// so the formatter never sees the option directives.
314+
const codeLines = blockLines.slice(optionLines);
315+
316+
// Nothing to format if the block is entirely option directives (or only
317+
// trailing whitespace after them, which `lines()` may produce from a
318+
// final newline in `token.data`).
319+
if (codeLines.every(l => l.trim() === "")) {
320+
return undefined;
321+
}
322+
const vdoc = virtualDocForCode(codeLines, language);
253323

254324
const edits = await executeFormatDocumentProvider(
255325
vdoc,
256326
doc,
257-
formattingOptions(doc.uri, vdoc.language)
327+
formattingOptions(doc.uri, vdoc.language, defaultOptions)
258328
);
259329

260-
if (!edits) {
330+
if (!edits || edits.length === 0) {
261331
// Either no formatter picked us up, or there were no edits required.
262332
// We can't determine the difference though!
263333
return undefined;
264334
}
265335

266336
// Because we format with the block code copied in an empty virtual
267337
// document, we need to adjust the ranges to match the edits to the block
268-
// cell in the original file.
338+
// cell in the original file. The `+ 1` skips the opening fence line and
339+
// `+ optionLines` skips the leading option directives we hid from the
340+
// formatter.
341+
const lineOffset = block.range.start.line + 1 + optionLines;
269342
const blockRange = new Range(
270343
new Position(block.range.start.line, block.range.start.character),
271344
new Position(block.range.end.line, block.range.end.character)
272345
);
273-
const adjustedEdits = edits
274-
.map(edit => {
275-
const range = new Range(
276-
new Position(edit.range.start.line + block.range.start.line + 1, edit.range.start.character),
277-
new Position(edit.range.end.line + block.range.start.line + 1, edit.range.end.character)
278-
);
279-
return new TextEdit(range, edit.newText);
280-
});
346+
const adjustedEdits = edits.map(edit => {
347+
const range = new Range(
348+
new Position(edit.range.start.line + lineOffset, edit.range.start.character),
349+
new Position(edit.range.end.line + lineOffset, edit.range.end.character)
350+
);
351+
return new TextEdit(range, edit.newText);
352+
});
281353

282354
// Bail if any edit is out of range. We used to filter these edits out but
283355
// this could bork the cell. Return `[]` to indicate that we tried.
284356
if (adjustedEdits.some(edit => !blockRange.contains(edit.range))) {
285-
window.showInformationMessage(
286-
"Formatting edits were out of range and could not be applied to the code cell."
287-
);
357+
if (!silentOutOfRange) {
358+
window.showInformationMessage(
359+
"Formatting edits were out of range and could not be applied to the code cell."
360+
);
361+
}
288362
return [];
289363
}
290364

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
title: Formatting Multiline Python Code Cells with Comments
3+
subtitle: https://github.com/quarto-dev/quarto/pull/655
4+
format: html
5+
---
6+
7+
```{python}
8+
#| label: multiline
9+
#| echo: false
10+
#standalone comment
11+
x=1 #inline comment
12+
y=2
13+
z=x+y
14+
```
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
title: Formatting Python Code Cells with Multiple Options
3+
subtitle: https://github.com/quarto-dev/quarto/pull/655
4+
format: html
5+
---
6+
7+
```{python}
8+
#| label: multi
9+
#| echo: false
10+
#| warning: false
11+
x=1;y=2;z=x+y
12+
```
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
title: Formatting Python Code Cells without Options
3+
subtitle: https://github.com/quarto-dev/quarto/pull/655
4+
format: html
5+
---
6+
7+
```{python}
8+
x=1;y=2;z=x+y
9+
```
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
title: Formatting Python Code Cells with Only Options
3+
subtitle: https://github.com/quarto-dev/quarto/pull/655
4+
format: html
5+
---
6+
7+
```{python}
8+
#| label: only-options
9+
```
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
title: Formatting Python Code Cells with Option Directive Variants
3+
subtitle: https://github.com/quarto-dev/quarto/pull/655
4+
format: html
5+
---
6+
7+
```{python}
8+
#|label: no-space
9+
# | space-before-pipe
10+
#| extra-space-after
11+
x=1;y=2;z=x+y
12+
```
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
title: Formatting Python Code Cells
3+
subtitle: https://github.com/quarto-dev/quarto/pull/655
4+
format: html
5+
---
6+
7+
```{python}
8+
#| label: my-code
9+
x=1;y=2;z=x+y
10+
```
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
title: Formatting an R Document with Multiple Blocks
3+
subtitle: https://github.com/quarto-dev/quarto/pull/655
4+
format: html
5+
---
6+
7+
```{r}
8+
#| label: first
9+
x<-1
10+
```
11+
12+
Some prose between the cells.
13+
14+
```{r}
15+
#| label: second
16+
#| echo: false
17+
y<-2
18+
```

0 commit comments

Comments
 (0)