Skip to content

UTF-16 surrogate pairs decoding #3

@vshabanov

Description

@vshabanov

Hi, UTF-16 surrogate pairs are incorrectly decoded:

λ> Json.decode $ Bytes.fromByteString "\"\\ud800\\udc00\""
Right (String "\65533\65533")
-- should be
λ> Aeson.decode "\"\\ud800\\udc00\"" :: Maybe Aeson.Value
Just (String "\65536")

And invalid sequences are emitted in some cases:

λ> J.decode $ Bytes.fromByteString "\"\\ud800\\udfff\""
Right (String "*** Exception: recoverDecode: invalid argument (invalid byte sequence)
-- should be
λ> Aeson.decode "\"\\ud800\\udfff\"" :: Maybe Aeson.Value
Just (String "\66559")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions