fix tag null segsegv.#848
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #848 +/- ##
========================================
Coverage 61.60% 61.60%
========================================
Files 734 734
Lines 45949 45968 +19
Branches 6895 6897 +2
========================================
+ Hits 28306 28318 +12
- Misses 16631 16637 +6
- Partials 1012 1013 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…aths The dataset layer flattened a table device into a "table.tag.field" string and dropped null tags entirely, which lost the tag position and made two devices with nulls in different positions collide (e.g. a.<null>.b.c vs a.b.<null>.c), so they became unaddressable. Reading them by the auto-generated name also could not distinguish a real null tag from the literal string "null". Introduce a SeriesPath (str subclass): its string value is the escaped path form with a collision-proof \N marker for null tags, and it also exposes the structured table / tags / field components (a None entry in tags = null). A real value can never escape to \N because escaping doubles backslashes, so \N is unambiguous against the literal string "null". - metadata.py: add SeriesPath; build/split/resolve now keep every tag position and mark null with \N / None. Every device maps to a unique path, so the sparse/compressed lookup and the "ambiguous path" fallback are removed in favor of a single direct lookup. - dataframe.py: list_timeseries() returns SeriesPath; loc[...] and resolution accept a SeriesPath or a path string (with \N); drop the sparse-tag index. - tsfile_py_cpp.pyx: key the timeseries-metadata map by the device segment tuple instead of the device-name string. The name renders null as "null", so keying by it collided a real null tag with the literal "null" and silently dropped one device. - Export SeriesPath from the package; update/extend tests (null position preservation, real-null vs string "null", SeriesPath round-trip and escaping).
There was a problem hiding this comment.
Pull request overview
This PR fixes table-model nullable-tag handling end-to-end (C++ core → C wrapper → Python) by preventing null-segment dereferences, representing null segments losslessly across language boundaries, and updating the Python dataset layer to preserve null tag positions via a \N marker and a SeriesPath str subclass.
Changes:
- Make all C++ tag comparison filters null-safe and add
IS NULL/IS NOT NULLtag filter operators (including “omitted trailing segment == NULL” semantics). - Expose null tag segments through the C wrapper as
NULLpointers and key Python metadata maps by the full segment tuple to avoid collisions with the literal string"null". - Add
SeriesPath+\N-based logical paths and update dataset resolution/filtering logic + tests to round-trip nullable tags without ambiguity.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| python/tsfile/tsfile_reader.pyx | Updates get_timeseries_metadata() contract to return a dict keyed by device segment tuples (nullable-safe). |
| python/tsfile/tsfile_py_cpp.pyx | Converts C metadata maps/devices to Python using segment tuples (with None for NULL segments). |
| python/tsfile/tsfile_cpp.pxd | Extends the Cython-exposed TagFilterOp enum with IS_NULL / IS_NOT_NULL. |
| python/tsfile/tag_filter.py | Adds Python-level tag_is_null / tag_is_not_null helpers and repr/value handling for no-value operators. |
| python/tsfile/dataset/reader.py | Builds exact-match tag filters using IS NULL when tag values are missing/None (incl. trailing omissions). |
| python/tsfile/dataset/metadata.py | Introduces SeriesPath, \N null marker, position-preserving path split/join, and direct device-key resolution. |
| python/tsfile/dataset/dataframe.py | Switches list_timeseries() to return SeriesPath and resolves series names via direct key lookup (nullable-safe). |
| python/tsfile/dataset/init.py | Re-exports SeriesPath from the dataset package. |
| python/tsfile/init.py | Re-exports new tag filter helpers and SeriesPath at the top-level package. |
| python/tests/test_tsfile_dataset.py | Adds coverage for nullable tag isolation, SeriesPath round-trips/escaping, and “wrong tag count” rejection. |
| python/tests/test_reader_metadata.py | Updates expectations to index metadata by segment tuple keys instead of device path strings. |
| cpp/test/reader/filter/tag_filter_test.cc | Adds tests for null-segment safety and new IS NULL / IS NOT NULL semantics. |
| cpp/src/reader/filter/tag_filter.h | Declares TagIsNull / TagIsNotNull and builder methods. |
| cpp/src/reader/filter/tag_filter.cc | Implements null checks in all comparison filters and adds IS NULL / IS NOT NULL implementations. |
| cpp/src/cwrapper/tsfile_cwrapper.h | Extends C API TagFilterOp and documents value being ignored for null-check ops. |
| cpp/src/cwrapper/tsfile_cwrapper.cc | Propagates null segments as NULL pointers and wires new tag filter ops into the C API. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for char in series_path: | ||
| if escaping: | ||
| current.append(char) | ||
| if char == _NULL_MARKER: # \N -> the whole component is a null tag | ||
| is_null = True | ||
| else: # \\ -> \, \. -> ., any other \x -> x | ||
| current.append(char) | ||
| escaping = False | ||
| continue | ||
| if char == _PATH_ESCAPE: | ||
| elif char == _PATH_ESCAPE: | ||
| escaping = True | ||
| continue | ||
| if char == _PATH_SEPARATOR: | ||
| parts.append("".join(current)) | ||
| elif char == _PATH_SEPARATOR: | ||
| parts.append(None if is_null else "".join(current)) | ||
| current = [] | ||
| continue | ||
| current.append(char) | ||
| is_null = False | ||
| else: | ||
| current.append(char) |
| ] | ||
|
|
||
|
|
||
| def build_series_path(catalog: MetadataCatalog, device_id: int, field_idx: int) -> str: |
hongzhi-gao
left a comment
There was a problem hiding this comment.
Please update the Python API docs: get_timeseries_metadata() now keys by segments tuple (with None for null tags)
| } | ||
|
|
||
| Filter* TagFilterBuilder::is_null(const std::string& columnName) { | ||
| auto idx = get_id_column_index(columnName); |
Summary
The Python SDK segfaulted when reading a table-model TsFile in which a device
has a
NULLtag value (e.g.14_boundary_table_nullable_tag_values_two_fields.tsfile),and even with the crash worked around, such devices read back as all-
NaNor gotsilently dropped. This PR makes null-tagged devices read correctly end to end
(C++ core → C wrapper → Python), and adds first-class
IS NULLsupport.Root cause
A null tag is stored as a
nullptrsegment in the device id.not for
nullptr, then did*segments[i]— dereferencing a null pointer.(
lldbpinned it tostorage::TagEq::satisfyRow.)literal string
"null", while the query path kept the realnullptr. So theauto-generated
TagEq("null")could never match the real-null device, and themetadata map (keyed by the device-name string, which renders null as
"null")collided a real null tag with the literal string
"null"and dropped a device.table.tag.fieldand dropped null tags, losing their position, so deviceswith nulls in different positions (
a.<null>.b.cvsa.b.<null>.c) collidedand became unaddressable.
API / behavior changes
list_timeseries()now returnsList[SeriesPath](stillstr, backwardcompatible). Null tags keep their position via
\N, e.g.eval_nullable_tags.\N.sensor_A.temperature.loc[time, series]accepts a path string (\N= null) or aSeriesPath,which can be built three equivalent ways:
Over-specified paths (too many tags) now fail with
Series not foundinsteadof being silently truncated into a wrong match.
Tests
TagIsNull/TagIsNotNull(including absent trailing segments)."null",SeriesPathconstruction forms and escaping round-trip, and rejection ofover-specified paths. Full Python suite: 149 passing.