[python] Fix enum member names corrupted by YAML 1.1/1.2 boundary#11162
Open
msyyc wants to merge 5 commits into
Open
[python] Fix enum member names corrupted by YAML 1.1/1.2 boundary#11162msyyc wants to merge 5 commits into
msyyc wants to merge 5 commits into
Conversation
Date-like TypeSpec labels (e.g. 2020-01-01) produce snake-cased enum member names such as 2020_01_01. js-yaml (YAML 1.2) dumps these unquoted, but the Python generator parses with PyYAML (YAML 1.1), which reads 2020_01_01 back as the integer 20200101, corrupting the name. Fix the root cause in the emitter by force-quoting string scalars when serializing the code model so names round-trip as strings, and keep pygen robust by coercing enum value names to str. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
commit: |
Contributor
|
All changed packages have been documented.
Show changes
|
With the emitter force-quoting string scalars, enum member names always round-trip as strings, so the defensive name coercion (and its tests) are no longer needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
You can try these changes here
|
The emitter force-quoting fix makes the non-string description/enum-name handling from microsoft#11143 unnecessary, so revert those pygen changes and their tests, leaving a focused emitter-only fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| * @param codemodel Codemodel to serialize | ||
| * @return the YAML representation of the codemodel. | ||
| */ | ||
| export function dumpCodeModelToYaml(codemodel: unknown): string { |
Member
There was a problem hiding this comment.
just wondering here do you actually even need yaml can't you serialize as json, would be faster and less deps
Member
There was a problem hiding this comment.
i feel like yaml is not a great format for data serialization better suited for config
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix #11138
Problem
For a date-like TypeSpec label such as:
the emitter's
enumName()correctly computes the enum member name as the string2020_01_01. However, the code model is serialized with js-yaml (YAML 1.2), which dumps this scalar unquoted (name: 2020_01_01). The Python generator then parses the YAML with PyYAML (YAML 1.1), where2020_01_01is a valid integer separator syntax, so the name is read back as the integer20200101— corrupting the enum member name.This is the underlying root cause behind #11143. That PR worked around the crash in pygen by reconstructing the name from the enum
valuevia regex, rather than fixing the corruption at the source.Fix
Emitter (root cause) — force-quote string scalars when serializing the code model so every string round-trips faithfully through PyYAML:
dumpCodeModelToYaml()helper (forceQuotes: true, quotingType: '"') inexternal-process.ts, used by both the node and pyodide serialization paths.pygen (defensive) — simplified enum value name handling to a plain
str()coercion so the generator stays robust to any scalar, letting the existing digit →ENUM_prefixing produceENUM_2020_01_01.Tests
emitter/test/external-process.test.ts) asserting ambiguous scalars are quoted and round-trip as strings.2021_01_01→ENUM_2021_01_01) and the defensive fallback (int20210101→ENUM_20210101).Verified end-to-end: emitter dump → PyYAML load now yields the string
'2020_01_01'(previouslyint 20200101). Emitter vitest (4 passed) and pygen pytest (16 passed) green.