gh-150638: Improve performance of json.loads and json.load for numeric data by eendebakpt · Pull Request #150639 · python/cpython

eendebakpt · 2026-05-30T21:19:15Z

_match_number_unicode() (the C accelerator behind json.loads) previously allocated a PyBytes object for every number, copied the digits into it, and then called the generic PyLong_FromString / PyFloat_FromString parsers.
This PR parses the common cases directly from the already-scanned text.

Benchmark	main	this PR	speedup
`json.loads`, number-heavy document (script below)	3.05 ms	2.38 ms	1.28×
`json.load`, same document via file object	3.17 ms	2.48 ms	1.28×
pyperformance `bm_json_loads`	25.2 µs	23.9 µs	1.05×

The standard bm_json_loads document is string/dict-dominated, so it gains
less.

Benchmark script

"""Benchmark json.loads() and json.load() on a number-heavy document.

The document is generated deterministically at import time (no external
files) and resembles a typical telemetry/API payload: a list of records
mixing integers, 19-digit timestamps, negative integers, floats, short
strings, booleans and small integer arrays.

json.load(fp) is json.loads(fp.read()); here fp is an in-memory io.StringIO
(rewound each call) so the same document is parsed without disk noise.

Inline data size: ~304 KiB (2000 records).
"""
import io
import json
import pyperf


def build_document(n=2000):
    return [
        {
            "id": i,
            "timestamp": 1_700_000_000_000_000_000 + i * 1_000,  # 19-digit int
            "value": i * 1.5 - 1000.0,                           # float
            "delta": -i,                                         # negative int
            "label": "item-%d" % i,                              # short string
            "ok": i % 2 == 0,                                    # bool
            "samples": [i, -i, i * 2, i * 3, i * 5],             # int array
        }
        for i in range(n)
    ]


JSON_DATA = json.dumps(build_document())
STREAM = io.StringIO(JSON_DATA)


def load_from_stream():
    STREAM.seek(0)
    return json.load(STREAM)


if __name__ == "__main__":
    runner = pyperf.Runner()
    runner.metadata["description"] = "json.loads()/json.load() on a number-heavy document"
    runner.bench_func("json_loads", json.loads, JSON_DATA)
    runner.bench_func("json_load", load_from_stream)

Issue: Improve performance of json.loads and json.load for numeric data #150638

Add a fast path to _match_number_unicode for integers that fit in a 64-bit integer (at most 19 decimal digits): accumulate the value directly into an unsigned long long instead of allocating a PyBytes and calling the generic PyLong_FromString. Positive values use PyLong_FromUnsignedLongLong; negatives within long long range use PyLong_FromLongLong; larger integers fall back to the previous path. For floats and big integers, copy the (always-ASCII) number text into a stack buffer for the common short case to avoid the PyBytes allocation, and call PyOS_string_to_double directly for floats. Benchmarks (optimized free-threaded build): * pyperformance json_loads: 1.06x faster overall * microbench: small int arrays ~2x, 20-int doc 1.48x, mixed dict 1.16x All test_json tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

bedevere-app Bot added the awaiting review label May 30, 2026

bedevere-app Bot mentioned this pull request May 30, 2026

Improve performance of json.loads and json.load for numeric data #150638

Open

blurb-it Bot and others added 4 commits May 30, 2026 21:22

📜🤖 Added by blurb_it.

b001f96

gh-XXXXX: Add tests for json.loads number parsing edge cases

17d1971

Merge remote-tracking branch 'origin/main' into json-loads-opt

43f14d2

Merge remote-tracking branch 'pte/json-loads-opt' into json-loads-opt

b0c486f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-150638: Improve performance of json.loads and json.load for numeric data#150639

gh-150638: Improve performance of json.loads and json.load for numeric data#150639
eendebakpt wants to merge 5 commits into
python:mainfrom
eendebakpt:json-loads-opt

eendebakpt commented May 30, 2026 •

edited by bedevere-app Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

eendebakpt commented May 30, 2026 • edited by bedevere-app Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eendebakpt commented May 30, 2026 •

edited by bedevere-app Bot

Loading