Skip to content

fix tag null segsegv.#848

Merged
jt2594838 merged 5 commits into
developfrom
fix_null_tag_filter
Jun 25, 2026
Merged

fix tag null segsegv.#848
jt2594838 merged 5 commits into
developfrom
fix_null_tag_filter

Conversation

@ColinLeeo

@ColinLeeo ColinLeeo commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

The Python SDK segfaulted when reading a table-model TsFile in which a device
has a NULL tag value (e.g. 14_boundary_table_nullable_tag_values_two_fields.tsfile),
and even with the crash worked around, such devices read back as all-NaN or got
silently dropped. This PR makes null-tagged devices read correctly end to end
(C++ core → C wrapper → Python), and adds first-class IS NULL support.

Root cause

A null tag is stored as a nullptr segment in the device id.

  1. Crash (SIGSEGV). Every C++ tag filter only checked the column-index bound,
    not for nullptr, then did *segments[i] — dereferencing a null pointer.
    (lldb pinned it to storage::TagEq::satisfyRow.)
  2. Silent wrong data. The C wrapper exposed a null segment to Python as the
    literal string "null", while the query path kept the real nullptr. So the
    auto-generated TagEq("null") could never match the real-null device, and the
    metadata map (keyed by the device-name string, which renders null as "null")
    collided a real null tag with the literal string "null" and dropped a device.
  3. Lossy logical paths. The dataset layer flattened a device to
    table.tag.field and dropped null tags, losing their position, so devices
    with nulls in different positions (a.<null>.b.c vs a.b.<null>.c) collided
    and became unaddressable.

API / behavior changes

  • list_timeseries() now returns List[SeriesPath] (still str, backward
    compatible). Null tags keep their position via \N, e.g.
    eval_nullable_tags.\N.sensor_A.temperature.

  • loc[time, series] accepts a path string (\N = null) or a SeriesPath,
    which can be built three equivalent ways:

    series = df.list_timeseries()
    df.loc[:, series]                                          # round-trips losslessly
    df.loc[:, "eval_nullable_tags.\\N.sensor_A.temperature"]  # string + \N
    df.loc[:, SeriesPath(["eval_nullable_tags", None, "sensor_A", "temperature"])]
    #         flat [table, *tags, field];  None = null tag
    
  • Over-specified paths (too many tags) now fail with Series not found instead
    of being silently truncated into a wrong match.

Tests

  • C++ gtest: null-segment safety for every comparison filter, plus
    TagIsNull / TagIsNotNull (including absent trailing segments).
  • Python: null-position preservation, real-null vs the string "null",
    SeriesPath construction forms and escaping round-trip, and rejection of
    over-specified paths. Full Python suite: 149 passing.

@codecov-commenter

codecov-commenter commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 61.36364% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.60%. Comparing base (9e929f0) to head (cb4ab89).

Files with missing lines Patch % Lines
cpp/src/reader/filter/tag_filter.cc 70.27% 0 Missing and 11 partials ⚠️
cpp/src/cwrapper/tsfile_cwrapper.cc 14.28% 6 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #848   +/-   ##
========================================
  Coverage    61.60%   61.60%           
========================================
  Files          734      734           
  Lines        45949    45968   +19     
  Branches      6895     6897    +2     
========================================
+ Hits         28306    28318   +12     
- Misses       16631    16637    +6     
- Partials      1012     1013    +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…aths

The dataset layer flattened a table device into a "table.tag.field" string and
dropped null tags entirely, which lost the tag position and made two devices
with nulls in different positions collide (e.g. a.<null>.b.c vs a.b.<null>.c),
so they became unaddressable. Reading them by the auto-generated name also could
not distinguish a real null tag from the literal string "null".

Introduce a SeriesPath (str subclass): its string value is the escaped path
form with a collision-proof \N marker for null tags, and it also exposes the
structured table / tags / field components (a None entry in tags = null). A real
value can never escape to \N because escaping doubles backslashes, so \N is
unambiguous against the literal string "null".

- metadata.py: add SeriesPath; build/split/resolve now keep every tag position
  and mark null with \N / None. Every device maps to a unique path, so the
  sparse/compressed lookup and the "ambiguous path" fallback are removed in
  favor of a single direct lookup.
- dataframe.py: list_timeseries() returns SeriesPath; loc[...] and resolution
  accept a SeriesPath or a path string (with \N); drop the sparse-tag index.
- tsfile_py_cpp.pyx: key the timeseries-metadata map by the device segment tuple
  instead of the device-name string. The name renders null as "null", so keying
  by it collided a real null tag with the literal "null" and silently dropped
  one device.
- Export SeriesPath from the package; update/extend tests (null position
  preservation, real-null vs string "null", SeriesPath round-trip and escaping).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes table-model nullable-tag handling end-to-end (C++ core → C wrapper → Python) by preventing null-segment dereferences, representing null segments losslessly across language boundaries, and updating the Python dataset layer to preserve null tag positions via a \N marker and a SeriesPath str subclass.

Changes:

  • Make all C++ tag comparison filters null-safe and add IS NULL / IS NOT NULL tag filter operators (including “omitted trailing segment == NULL” semantics).
  • Expose null tag segments through the C wrapper as NULL pointers and key Python metadata maps by the full segment tuple to avoid collisions with the literal string "null".
  • Add SeriesPath + \N-based logical paths and update dataset resolution/filtering logic + tests to round-trip nullable tags without ambiguity.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
python/tsfile/tsfile_reader.pyx Updates get_timeseries_metadata() contract to return a dict keyed by device segment tuples (nullable-safe).
python/tsfile/tsfile_py_cpp.pyx Converts C metadata maps/devices to Python using segment tuples (with None for NULL segments).
python/tsfile/tsfile_cpp.pxd Extends the Cython-exposed TagFilterOp enum with IS_NULL / IS_NOT_NULL.
python/tsfile/tag_filter.py Adds Python-level tag_is_null / tag_is_not_null helpers and repr/value handling for no-value operators.
python/tsfile/dataset/reader.py Builds exact-match tag filters using IS NULL when tag values are missing/None (incl. trailing omissions).
python/tsfile/dataset/metadata.py Introduces SeriesPath, \N null marker, position-preserving path split/join, and direct device-key resolution.
python/tsfile/dataset/dataframe.py Switches list_timeseries() to return SeriesPath and resolves series names via direct key lookup (nullable-safe).
python/tsfile/dataset/init.py Re-exports SeriesPath from the dataset package.
python/tsfile/init.py Re-exports new tag filter helpers and SeriesPath at the top-level package.
python/tests/test_tsfile_dataset.py Adds coverage for nullable tag isolation, SeriesPath round-trips/escaping, and “wrong tag count” rejection.
python/tests/test_reader_metadata.py Updates expectations to index metadata by segment tuple keys instead of device path strings.
cpp/test/reader/filter/tag_filter_test.cc Adds tests for null-segment safety and new IS NULL / IS NOT NULL semantics.
cpp/src/reader/filter/tag_filter.h Declares TagIsNull / TagIsNotNull and builder methods.
cpp/src/reader/filter/tag_filter.cc Implements null checks in all comparison filters and adds IS NULL / IS NOT NULL implementations.
cpp/src/cwrapper/tsfile_cwrapper.h Extends C API TagFilterOp and documents value being ignored for null-check ops.
cpp/src/cwrapper/tsfile_cwrapper.cc Propagates null segments as NULL pointers and wires new tag filter ops into the C API.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 233 to +247
for char in series_path:
if escaping:
current.append(char)
if char == _NULL_MARKER: # \N -> the whole component is a null tag
is_null = True
else: # \\ -> \, \. -> ., any other \x -> x
current.append(char)
escaping = False
continue
if char == _PATH_ESCAPE:
elif char == _PATH_ESCAPE:
escaping = True
continue
if char == _PATH_SEPARATOR:
parts.append("".join(current))
elif char == _PATH_SEPARATOR:
parts.append(None if is_null else "".join(current))
current = []
continue
current.append(char)
is_null = False
else:
current.append(char)
]


def build_series_path(catalog: MetadataCatalog, device_id: int, field_idx: int) -> str:

@hongzhi-gao hongzhi-gao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the Python API docs: get_timeseries_metadata() now keys by segments tuple (with None for null tags)

Comment thread cpp/src/reader/filter/tag_filter.cc Outdated
}

Filter* TagFilterBuilder::is_null(const std::string& columnName) {
auto idx = get_id_column_index(columnName);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id -> tag

@jt2594838 jt2594838 merged commit 4f26e92 into develop Jun 25, 2026
29 checks passed
@jt2594838 jt2594838 deleted the fix_null_tag_filter branch June 25, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants