Investigate string decode performance improvements

In 665c428e6431aa3dd2d16b9a2d02739ce37409c4, I've added a benchmark that focuses on the performance of decoding strings. Here's the performance of decoding 4KB of json, most of which is just 30-character strings:

```
benchmarked json/url/100/decode
time                 8.053 μs   (7.736 μs .. 8.411 μs)
                     0.996 R²   (0.989 R² .. 0.999 R²)
mean                 8.089 μs   (8.051 μs .. 8.139 μs)
std dev              74.41 ns   (48.48 ns .. 107.8 ns)

benchmarked aeson/url/100/decode
time                 22.66 μs   (21.50 μs .. 24.51 μs)
                     0.994 R²   (0.989 R² .. 0.999 R²)
mean                 22.72 μs   (22.49 μs .. 22.84 μs)
std dev              276.1 ns   (159.0 ns .. 414.7 ns)
```

Not bad. We're ahead of aeson by a factor of three, but can this number be improved further? String decode currently walks the string, byte-by-byte, until it finds the end of it. As it walks the string, it keeps up with information about whether or not anything will need to be unescaped. I think that it should be possible to instead walk the string w64-by-w64. This could be done by adapting the [approach in bytestring-encodings](https://github.com/chessai/bytestring-encodings/blob/master/Data/ByteString/Internal/Ascii.hs) to work with `ByteArray` instead of `Ptr` and adding some additional [bit twiddling hacks](https://graphics.stanford.edu/~seander/bithacks.html). The general idea would be:

* Fail if you encounter a backslash
* Fail if you encounter a byte less than 0x20
* Fail if you encounter a byte greater than 0x7E
* Fail if your read would give you a w64 that straddled the end of the string (simplifies things a little)
* Succeed if you encounter a `"`

Failure just means to fall back to the existing string decode logic, and succeed means that we may perform a memcpy (as we do now). This whole thing is a little bit tricky because it's possible to encounter both a `"` and a failing byte in them same w64, and then the order that they showed up in matters. But I think that the conservative action of always failing even if the quote showed up first is probably the best course. It simplifies it, and the bytes after a string ends are probably ascii characters anyway, so this shouldn't cost a ton of performance.

This needs to be implemented and benchmarked, but I think it could this benchmark at least 2x faster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate string decode performance improvements #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Investigate string decode performance improvements #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions