Skip to content

Sort key libicu NIF#6050

Open
nickva wants to merge 1 commit into
mainfrom
sortkey-collation
Open

Sort key libicu NIF#6050
nickva wants to merge 1 commit into
mainfrom
sortkey-collation

Conversation

@nickva

@nickva nickva commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Add a sort key libicu NIF function. A sort key is an opaque binary representation generated by libicu from a key, which then can then be compared directly against other sort keys to produce an equivalent collation order as calling the pair-wise comparison libicu function.

The idea to use sort keys in the fabric view row "merge head" structure, where we merge together streaming rows from multiple workers. When we do that we keep either a sorted list (for map-only views) and then do an insertion sort step and take the minimum, or we keep the rows in key/value structure for reduce views and find the minimum key and its grouped values. In either case we can reduce the number of libicu compare(a,b) calls from O(K^2) to just O(K) sort key generating calls and since libicu calls are not cheap, it worth adding an extra NIF calls just for it.

As a side note: we've actually implemented this once during the now abandoned CouchDB 4.0 /w FoundationDB backed attempt, there we stored sort key in the database, which libicu workers do not recommend doing. Here we're planning on using in memory only on the coordinator.

https://unicode-org.github.io/icu/userguide/collation/concepts#sortkeys-vs-comparison

Add a sort key libicu NIF function. A sort key is an opaque binary
representation generated by libicu from a key, which then can then be compared
directly against other sort keys to produce an equivalent collation order as
calling the pair-wise comparison libicu function.

The idea to use sort keys in the fabric view row "merge head" structure, where
we merge together streaming rows from multiple workers. When we do that we keep
either a sorted list (for map-only views) and then do an insertion sort step
and take the minimum, or we keep the rows in key/value structure for reduce
views and find the minimum key and its grouped values. In either case we can
reduce the number of libicu compare(a,b) calls from O(K^2) to just O(K) sort
key generating calls and since libicu calls are not cheap, it worth adding an
extra NIF calls just for it.

As a side note: we've actually implemented this once during the now abandonned
CouchDB 4.0 /w FoundationDB backed attempt, there we stored sort key in the
database, which libicu workers do not recommend doing. Here we're planning on
using in memory only on the coordinator.

https://unicode-org.github.io/icu/userguide/collation/concepts#sortkeys-vs-comparison
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant