Skip to content

fix: array creation in tests#169

Open
flying-sheep wants to merge 4 commits into
mainfrom
fix-array-create
Open

fix: array creation in tests#169
flying-sheep wants to merge 4 commits into
mainfrom
fix-array-create

Conversation

@flying-sheep
Copy link
Copy Markdown
Collaborator

@flying-sheep flying-sheep commented May 19, 2026

so what I thought happens is that they split up codecs into

  1. shards: if present, this contains the parameters for an implicitly created ArrayToArrayCodec that wraps all other given codecs and applies them for each shard individually
  2. filters: list of ArrayToArrayCodecs
  3. serializer: a single ArrayToBytesCodec
  4. compressors: a list of BytesToBytesCodecs

this assumption seems to hold for e.g. tests/test_codecs.py::test_order, but other tests fail, so apparently I didn’t understand this.

Comment thread tests/test_sharding.py Outdated
Comment thread tests/test_sharding.py Outdated
@d-v-b
Copy link
Copy Markdown

d-v-b commented May 26, 2026

so what I thought happens is that they split up codecs into

shards: if present, this contains the parameters for an implicitly created ArrayToArrayCodec that wraps all other given codecs and applies them for each shard individually
filters: list of ArrayToArrayCodecs
serializer: a single ArrayToBytesCodec
compressors: a list of BytesToBytesCodecs

That's correct, and we should spell this out more clearly in the docs. The basic problem we had to solve was making sharding look like a top-level array field, instead of a codec that contains other codecs. But this simplification is not free

@flying-sheep
Copy link
Copy Markdown
Collaborator Author

flying-sheep commented May 27, 2026

OK, if I understand correctly, one has to change from

Array.create(
    chunk_shape=shard_shape,  # !
    codecs=[ShardingCodec(chunk_shape=chunk_shape)],
)

to

create_array(
    chunk_shape=chunk_shape,
    shards=shard_shape,
)

right?

If I do that, all tests pass except for this one.

IDK why the number of stored bytes goes down, any ideas?

$ open …/pytest-165/test_delete_empty_shards_local0/delete_empty_shards/c/0/0 | into binary
Length: 53 (0x35) bytes
00000000:   28 b5 2f fd  20 80 45 00  00 10 00 00  01 00 3b 05   (×/× ×E00•00•0;•
00000010:   58 00 00 00  00 00 00 00  00 11 00 00  00 00 00 00   X00000000•000000
00000020:   00 ff ff ff  ff ff ff ff  ff ff ff ff  ff ff ff ff   0×××××××××××××××
00000030:   ff a5 26 2a  18                                      ××&*•

async def test_delete_empty_shards(store: Store) -> None:
if not store.supports_deletes:
pytest.skip("store does not support deletes")
path = "delete_empty_shards"
spath = StorePath(store, path)
a = await create_async_array(
spath,
shape=(16, 16),
chunks=(8, 8),
shards=(8, 16),
dtype="uint16",
fill_value=1,
)
await _AsyncArrayProxy(a)[:, :].set(np.zeros((16, 16)))
await _AsyncArrayProxy(a)[8:, :].set(np.ones((8, 16)))
await _AsyncArrayProxy(a)[:, 8:].set(np.ones((16, 8)))
# chunk (0, 0) is full
# chunks (0, 1), (1, 0), (1, 1) are empty
# shard (0, 0) is half-full
# shard (1, 0) is empty
data = np.ones((16, 16), dtype="uint16")
data[:8, :8] = 0
assert np.array_equal(data, await _AsyncArrayProxy(a)[:, :].get())
assert (
await store.get(f"{path}/c/1/0", prototype=default_buffer_prototype()) is None
)
chunk_bytes = await store.get(f"{path}/c/0/0", prototype=default_buffer_prototype())
assert chunk_bytes is not None
assert len(chunk_bytes) == 16 * 2 + 8 * 8 * 2 + 4 == 164

@d-v-b
Copy link
Copy Markdown

d-v-b commented May 27, 2026

IDK why the number of stored bytes goes down, any ideas?

Offhand I have no idea! The first thing I would check is the array metadata document -- maybe the codecs field of the sharding codec ends up with a different bytes -> bytes codec in one of the two invocations?

@flying-sheep
Copy link
Copy Markdown
Collaborator Author

flying-sheep commented May 27, 2026

Ah, of course, the default compressor is zstd, so before, by overriding codecs, that got overridden, now it doesn’t.

I think that means all is well now. I still don’t understand everything enough to confidently say that my assumption in #169 (comment) is true, but I’m relatively confident.

@flying-sheep flying-sheep marked this pull request as ready for review May 27, 2026 17:50
@flying-sheep flying-sheep requested a review from ilan-gold May 27, 2026 17:51
Comment thread tests/test_codecs.py
chunk_shape=(16, 16),


def test_invalid_metadata_chunk_shape(store: Store) -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and the next test (test_invalid_metadata_inner_chunk_shape) are the most likely candidates for me messing up:

Since both the parameter semantics and the error message changes between the two APIs, it’s possible that I messed up and these no longer test what they should.

But if my assumption about the semantics is accurate, this should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants