Skip to content

Fix Langflow Store Message content_class mislabeling on raw ingests#27

Merged
ethanj merged 1 commit into
mainfrom
sync/monorepo-b06c8c1
Jun 11, 2026
Merged

Fix Langflow Store Message content_class mislabeling on raw ingests#27
ethanj merged 1 commit into
mainfrom
sync/monorepo-b06c8c1

Conversation

@ethanj

@ethanj ethanj commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Fix the Langflow plugin incorrectly stamping content_class="summary" on every Store Message ingest. The plugin was mislabeling raw transcripts as summaries, causing stores to fail with 422 raw_content_rejected on cores running RAW_CONTENT_POLICY=reject. The bridge now omits content_class by default and lets callers classify content explicitly.

Changes

  • _sdk.py: ingest_messages accepts an optional content_class parameter. When unset, the field is omitted from the ingest body entirely, so a RAW_CONTENT_POLICY=reject core redacts the raw transcript from the audit episode while extraction still runs and searchable memories are created.
  • store_message.py: Adds a Content Class dropdown (raw default; summary/redacted opt-in) to the Store Message component. Callers stamp summary or redacted only when the content is genuinely distilled or has had sensitive spans removed.
  • tests/test_sdk_bridge.py: Adds coverage asserting that content_class is absent from the ingest body when unset, and that an explicitly supplied value is forwarded correctly.
  • tests/fakes.py: Updates FakeBridge.ingest_messages to accept and record content_class.
  • pyproject.toml: Tightens the atomicmemory lower-bound to >=1.1.0, which is required for the SDK to forward content_class on all ingest modes.
  • packages/core/package.json: Version bump to 1.1.0 — aligns the manifest with the feature payload already on main (entities API, capabilities/OpenAPI routes, retrieval receipts) ahead of its registry release.
  • uv.lock: Adds the uv lockfile for reproducible Langflow plugin dependency resolution.

Why

mode="messages" ingests run core-side LLM extraction and persist the raw transcript in the audit episode — not a derived summary. Blanket-stamping content_class="summary" told the core the content was safe for audit retention when it was not, defeating RAW_CONTENT_POLICY=reject. Omitting the field by default lets the policy do its job: the raw transcript is redacted from the audit episode while extracted memories remain searchable. Callers that genuinely produce summaries or redacted content can now opt into the correct label via the new dropdown.

Validation

  • New unit tests cover the "omit when unset" and "forward when explicit" paths in the SDK bridge.
  • Existing Store Message and bridge tests updated to match the new ingest_messages signature.
  • Bump atomicmemory>=1.1.0 ensures the SDK dependency that forwards content_class on all ingest modes is present.

## Summary

Fix the Langflow plugin incorrectly stamping `content_class="summary"` on every Store Message ingest. The plugin was mislabeling raw transcripts as summaries, causing stores to fail with `422 raw_content_rejected` on cores running `RAW_CONTENT_POLICY=reject`. The bridge now omits `content_class` by default and lets callers classify content explicitly.

## Changes

- **`_sdk.py`**: `ingest_messages` accepts an optional `content_class` parameter. When unset, the field is omitted from the ingest body entirely, so a `RAW_CONTENT_POLICY=reject` core redacts the raw transcript from the audit episode while extraction still runs and searchable memories are created.
- **`store_message.py`**: Adds a `Content Class` dropdown (`raw` default; `summary`/`redacted` opt-in) to the Store Message component. Callers stamp `summary` or `redacted` only when the content is genuinely distilled or has had sensitive spans removed.
- **`tests/test_sdk_bridge.py`**: Adds coverage asserting that `content_class` is absent from the ingest body when unset, and that an explicitly supplied value is forwarded correctly.
- **`tests/fakes.py`**: Updates `FakeBridge.ingest_messages` to accept and record `content_class`.
- **`pyproject.toml`**: Tightens the `atomicmemory` lower-bound to `>=1.1.0`, which is required for the SDK to forward `content_class` on all ingest modes.
- **`packages/core/package.json`**: Version bump to 1.1.0 — aligns the manifest with the feature payload already on main (entities API, capabilities/OpenAPI routes, retrieval receipts) ahead of its registry release.
- **`uv.lock`**: Adds the uv lockfile for reproducible Langflow plugin dependency resolution.

## Why

`mode="messages"` ingests run core-side LLM extraction and persist the *raw* transcript in the audit episode — not a derived summary. Blanket-stamping `content_class="summary"` told the core the content was safe for audit retention when it was not, defeating `RAW_CONTENT_POLICY=reject`. Omitting the field by default lets the policy do its job: the raw transcript is redacted from the audit episode while extracted memories remain searchable. Callers that genuinely produce summaries or redacted content can now opt into the correct label via the new dropdown.

## Validation

- New unit tests cover the "omit when unset" and "forward when explicit" paths in the SDK bridge.
- Existing Store Message and bridge tests updated to match the new `ingest_messages` signature.
- Bump `atomicmemory>=1.1.0` ensures the SDK dependency that forwards `content_class` on all ingest modes is present.
@ethanj ethanj merged commit 28e07a9 into main Jun 11, 2026
11 checks passed
@ethanj ethanj deleted the sync/monorepo-b06c8c1 branch June 11, 2026 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant