Skip to content

[python] Fix enum member names corrupted by YAML 1.1/1.2 boundary#11162

Open
msyyc wants to merge 5 commits into
microsoft:mainfrom
msyyc:fix-enum-yaml-scalar-roundtrip
Open

[python] Fix enum member names corrupted by YAML 1.1/1.2 boundary#11162
msyyc wants to merge 5 commits into
microsoft:mainfrom
msyyc:fix-enum-yaml-scalar-roundtrip

Conversation

@msyyc

@msyyc msyyc commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

fix #11138

Problem

For a date-like TypeSpec label such as:

union ApiVersion {
  string;
  `2020-01-01`: "2020-01-01";
}

the emitter's enumName() correctly computes the enum member name as the string 2020_01_01. However, the code model is serialized with js-yaml (YAML 1.2), which dumps this scalar unquoted (name: 2020_01_01). The Python generator then parses the YAML with PyYAML (YAML 1.1), where 2020_01_01 is a valid integer separator syntax, so the name is read back as the integer 20200101 — corrupting the enum member name.

This is the underlying root cause behind #11143. That PR worked around the crash in pygen by reconstructing the name from the enum value via regex, rather than fixing the corruption at the source.

Fix

Emitter (root cause) — force-quote string scalars when serializing the code model so every string round-trips faithfully through PyYAML:

  • Added a shared dumpCodeModelToYaml() helper (forceQuotes: true, quotingType: '"') in external-process.ts, used by both the node and pyodide serialization paths.

pygen (defensive) — simplified enum value name handling to a plain str() coercion so the generator stays robust to any scalar, letting the existing digit → ENUM_ prefixing produce ENUM_2020_01_01.

Tests

  • New emitter unit test (emitter/test/external-process.test.ts) asserting ambiguous scalars are quoted and round-trip as strings.
  • Updated pygen unit tests covering both the happy path (string 2021_01_01ENUM_2021_01_01) and the defensive fallback (int 20210101ENUM_20210101).

Verified end-to-end: emitter dump → PyYAML load now yields the string '2020_01_01' (previously int 20200101). Emitter vitest (4 passed) and pygen pytest (16 passed) green.

Date-like TypeSpec labels (e.g. 2020-01-01) produce snake-cased enum
member names such as 2020_01_01. js-yaml (YAML 1.2) dumps these unquoted,
but the Python generator parses with PyYAML (YAML 1.1), which reads
2020_01_01 back as the integer 20200101, corrupting the name.

Fix the root cause in the emitter by force-quoting string scalars when
serializing the code model so names round-trip as strings, and keep pygen
robust by coercing enum value names to str.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@microsoft-github-policy-service microsoft-github-policy-service Bot added the emitter:client:python Issue for the Python client emitter: @typespec/http-client-python label Jul 3, 2026
@pkg-pr-new

pkg-pr-new Bot commented Jul 3, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/@typespec/http-client-python@11162

commit: 6665a32

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

All changed packages have been documented.

  • @typespec/http-client-python
Show changes

@typespec/http-client-python - fix ✏️

Fix enum member names derived from date-like TypeSpec labels (e.g. `2020-01-01`) being corrupted by the js-yaml (YAML 1.2) to PyYAML (YAML 1.1) boundary. String scalars are now force-quoted when serializing the code model so names such as 2020_01_01 round-trip as strings instead of being read back as integers

With the emitter force-quoting string scalars, enum member names always
round-trip as strings, so the defensive name coercion (and its tests) are
no longer needed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azure-sdk-automation

azure-sdk-automation Bot commented Jul 3, 2026

Copy link
Copy Markdown

You can try these changes here

🛝 Playground 🌐 Website 🛝 VSCode Extension

msyyc and others added 3 commits July 3, 2026 16:45
The emitter force-quoting fix makes the non-string description/enum-name
handling from microsoft#11143 unnecessary, so revert those pygen changes and their
tests, leaving a focused emitter-only fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@msyyc msyyc enabled auto-merge July 3, 2026 09:13
* @param codemodel Codemodel to serialize
* @return the YAML representation of the codemodel.
*/
export function dumpCodeModelToYaml(codemodel: unknown): string {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering here do you actually even need yaml can't you serialize as json, would be faster and less deps

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like yaml is not a great format for data serialization better suited for config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

emitter:client:python Issue for the Python client emitter: @typespec/http-client-python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[http-client-python] AttributeError: 'int' object has no attribute 'rstrip' in preprocess update_description

2 participants