Skip to content

Add support for structured dtypes to zarr3 driver, open structs as void#271

Open
BrianMichell wants to merge 140 commits into
google:masterfrom
BrianMichell:v3_structs_and_void
Open

Add support for structured dtypes to zarr3 driver, open structs as void#271
BrianMichell wants to merge 140 commits into
google:masterfrom
BrianMichell:v3_structs_and_void

Conversation

@BrianMichell
Copy link
Copy Markdown
Contributor

Supersedes #264

Resolves comments 1 and 2

Implement shim for `open_as_void` driver level flag
* Begin removing void field shim

* Fully removed void string shim

* Cleanup debug prints

* Remove shimmed validation

* Remove unnecessary comment

* Prefer false over zero for ternary clarity
* Implement a more general and portable example set

* Fix driver cache bug

* Update example for template

* Cleanup example

* Remove testing examples from source
* Use the appropriate fill value for open_as_void structured data

* Cleanup
@laramiel
Copy link
Copy Markdown
Collaborator

I'll try to get to this in about a week, before I look this one over, please double check that the prior PR works for you. Also look over this one and see if any of the suggestions from the other one applies.

Matches the pattern from zarr v2 driver (PR google#272). When both "field"
and "open_as_void" are specified in the spec, return an error since
these options are mutually exclusive - field selects a specific field
from a structured array, while open_as_void provides raw byte access
to the entire structure.
The zarr3 URL syntax cannot represent field selection or void access
mode. Following the pattern from zarr v2 driver (PR google#272), ToUrl() now
returns an error when either of these options is specified instead of
silently ignoring them.
…trip

Following the pattern from zarr v2 driver (PR google#272), override
GetBoundSpecData in ZarrDataCache to set spec.open_as_void from
ChunkCacheImpl::open_as_void_. This ensures that when you open a
store with open_as_void=true and then call spec(), the resulting
spec correctly has open_as_void=true set.

Without this fix, opening a store with open_as_void=true and then
getting its spec would lose the open_as_void flag, causing incorrect
behavior if the spec is used to re-open the store.
@BrianMichell
Copy link
Copy Markdown
Contributor Author

Looks like development has picked back up on Tensorstore. I think I've addressed any outstanding issues and made sure everything is following the defined spec. Please let me know if there's anything else I can do to help move this forward any more!

Comment thread tensorstore/driver/zarr3/codec/transpose.cc Outdated
Comment thread tensorstore/driver/zarr3/codec/codec_spec.h
@laramiel
Copy link
Copy Markdown
Collaborator

laramiel commented May 21, 2026

@jbms and I talked briefly about this and he has some comments; briefly there are a few concerns regarding endian encoding/decoding when used with the zarr3 dtype and with field selection.

edit: ... making the "bytes" codec understand the new ZarrDType might be a somewhat invasive change to how the codecs work.

Comment thread tensorstore/driver/zarr3/dtype.cc
Comment thread tensorstore/driver/zarr3/dtype.cc
Comment thread tensorstore/driver/zarr3/dtype.h
Comment thread tensorstore/driver/zarr3/chunk_cache.cc Outdated
const auto& field = zarr_dtype_.fields[field_i];
const auto& component_shape = grid().components[field_i].shape();
auto result_array =
AllocateArray(component_shape, c_order, default_init, field.dtype);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should take preferred inner order into account when allocating here. Also should try to avoid copying at all, and just reference the original array with the correct stride, if it is already suitable aligned, as for zarr v2.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Suitably aligned and suitable endian)

Comment thread tensorstore/driver/zarr3/chunk_cache.h Outdated
Comment thread tensorstore/driver/zarr3/chunk_cache.h Outdated
Comment thread tensorstore/driver/zarr3/metadata.cc
decoded.fill_value = metadata->fill_value[0];
}
} else {
// Zero-filled scalar byte fill_value (broadcast to chunk shape at use).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fill value handling here seems problematic --- the zero-filled fill_value surely isn't going to work. Plus in some cases fill_value seems to be unset.

{0, 0}, {2, 2}))
.result());
EXPECT_EQ(b_data, b_read);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fill values for structured fields aren't handled correctly in the case of sharding --- please add tests for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants