Skip to content

Commit 984672f

Browse files
savannahostrowskipablogsal
authored andcommitted
First pass at edits
1 parent 23ea4a5 commit 984672f

1 file changed

Lines changed: 48 additions & 36 deletions

File tree

peps/pep-0830.rst

Lines changed: 48 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ PEP: 830
22
Title: Frame Pointers Everywhere: Enabling System-Level Observability for Python
33
Author: Pablo Galindo Salgado <pablogsal@python.org>,
44
Ken Jin <kenjin@python.org>,
5-
Savannah Ostrowski <savannahostrowski@gmail.com>,
5+
Savannah Ostrowski <savannah@python.org>,
66
Diego Russo <diego.russo@arm.com>
77
Discussions-To:
88
Status: Draft
@@ -15,7 +15,7 @@ Post-History:
1515
Abstract
1616
========
1717

18-
This PEP does two things:
18+
This PEP proposes two things:
1919

2020
1. **Build CPython with frame pointers by default on platforms that support
2121
them.** The default build configuration is changed to compile the
@@ -51,9 +51,9 @@ Motivation
5151

5252
Python's observability story (profiling, debugging, and system-level tracing)
5353
is fundamentally limited by the absence of frame pointers. The core motivation
54-
of this PEP is to make Python observable by default: profilers faster and more
55-
accurate, debuggers more reliable, and eBPF-based tools functional without
56-
workarounds.
54+
of this PEP is to make Python observable by default, so that profilers are faster
55+
and more accurate, debuggers are more reliable, and eBPF-based tools are functional
56+
without workarounds.
5757

5858
Today, users who want to profile CPython with system tools must rebuild the
5959
interpreter with special compiler flags, a step that most users cannot or will
@@ -201,10 +201,10 @@ processes. The Linux kernel has no DWARF unwinder and, per Linus Torvalds,
201201
will not gain one [#torvalds_fp]_; the kernel developed its own ORC format for
202202
internal use instead.
203203

204-
The impact extends beyond CPU profiling. Off-CPU flame graphs (used to
204+
The impact extends beyond CPU profiling. Off-CPU flamegraphs (used to
205205
diagnose latency caused by I/O waits, lock contention, and scheduling delays)
206206
rely on the same ``bpf_get_stackid()`` helper to capture the stack at the point
207-
where a thread blocks. As Brendan Gregg notes, off-CPU flame graphs "can be
207+
where a thread blocks. As Brendan Gregg notes, off-CPU flamegraphs "can be
208208
dominated by libc read/write and mutex functions, so without frame pointers end
209209
up mostly broken" [#gregg2024]_. For Python services where latency matters
210210
more than raw CPU throughput, off-CPU profiling is often the most valuable
@@ -405,30 +405,24 @@ The JIT Compiler Needs Frame Pointers to Be Debuggable
405405
------------------------------------------------------
406406

407407
CPython's copy-and-patch JIT (:pep:`744`) generates native machine code at
408-
runtime. Without frame pointers in that generated code, stack unwinding
409-
through JIT frames is broken for virtually every tool in the ecosystem: GDB,
410-
LLDB, libunwind, libdw (elfutils), py-spy, Austin, pystack, memray, ``perf``,
411-
and all eBPF-based profilers.
412-
413-
The investigation in issue `#126910`_ found that compiling the JIT stencils
414-
with ``-fno-omit-frame-pointer`` and ``-mno-omit-leaf-frame-pointer`` is a
415-
two-line change that would make most existing debuggers and profilers work with
416-
JIT-compiled code immediately. The measured overhead is approximately 2% on
417-
x86-64 and even lower on AArch64 (which has a dedicated link register). This
418-
is a remarkably good outcome: other JIT compilers (V8, LuaJIT, .NET CoreCLR,
419-
Julia, LLVM's ORC JIT) typically require hundreds to thousands of lines of code
420-
to implement custom DWARF ``.eh_frame`` generation, GDB JIT interface support
408+
runtime. Without frame pointers in the interpreter, stack unwinding through
409+
JIT frames is broken for virtually every tool in the ecosystem: GDB, LLDB,
410+
libunwind, libdw (elfutils), py-spy, Austin, pystack, memray, ``perf``, and
411+
all eBPF-based profilers. Ensuring full-stack observability for JIT-compiled
412+
code is a prerequisite for the JIT to be considered production-ready.
413+
414+
Individual JIT stencils do not need frame-pointer prologues; the entire JIT
415+
region can be treated as a single frameless region for unwinding purposes.
416+
What matters is that the interpreter itself is built with frame pointers, so
417+
that the frame-pointer register (``%rbp`` on x86-64, ``x29`` on AArch64) is
418+
reserved and not clobbered by stencil code. With frame pointers in the
419+
interpreter, unwinders can walk through JIT regions without needing to inspect
420+
individual stencils. This is a remarkably good outcome compared to other
421+
JIT compilers (V8, LuaJIT, .NET CoreCLR, Julia, LLVM's ORC JIT), which
422+
typically require hundreds to thousands of lines of code to implement custom
423+
DWARF ``.eh_frame`` generation, GDB JIT interface support
421424
(``__jit_debug_register_code``), and per-unwinder registration APIs
422-
(``_U_dyn_register``, ``__register_frame``). CPython's JIT may get most of the
423-
benefit from frame pointers alone if that follow-up change is adopted.
424-
425-
Critically, for JIT frame pointers to produce useful results, the interpreter
426-
itself must also have frame pointers. A JIT-compiled function calls back into
427-
the interpreter for many operations; if the interpreter frames lack frame
428-
pointers, the unwinder hits a gap and the stack trace is truncated. This PEP
429-
addresses that interpreter-side gap. JIT stencil flags (issue `#126910`_) are
430-
a complementary follow-up needed for complete stack unwinding in the presence
431-
of the JIT.
425+
(``_U_dyn_register``, ``__register_frame``).
432426

433427
The Ecosystem Has Already Adopted Frame Pointers
434428
------------------------------------------------
@@ -836,8 +830,21 @@ incorrectly.
836830
Performance
837831
-----------
838832

839-
.. TODO: Insert full pyperformance results here once data collection
840-
is complete.
833+
Full pyperformance results comparing the frame-pointer build against an
834+
identical build without frame pointers (geometric mean and per-benchmark
835+
range, 108 benchmarks):
836+
837+
===================================== =======================
838+
Machine Geometric mean overhead
839+
===================================== =======================
840+
Apple M2 Mac Mini (arm64, macOS) 1.01x slower
841+
Intel Xeon Platinum 8480 (x86-64) 1.01x slower
842+
AMD EPYC 9654 (x86-64) 1.01x slower
843+
AWS Graviton c7g.16xlarge (aarch64) 1.02x slower
844+
Ampere Altra Max (aarch64) 1.01x slower
845+
Raspberry Pi (aarch64) +X.X%
846+
macOS M3 (arm64) +X.X%
847+
===================================== =======================
841848

842849
This overhead applies to both the interpreter and to C extensions that inherit
843850
the flags via ``sysconfig``. Detailed microarchitectural analysis shows the
@@ -892,10 +899,15 @@ information not already available through CPython's existing interfaces.
892899
How to Teach This
893900
=================
894901

895-
No teaching is required. This change is invisible to Python users: no APIs
896-
change, no behaviour changes, and no user action is needed. The only observable
897-
effect is that profilers, debuggers, and system-level tracing tools produce
898-
more complete and more reliable results out of the box.
902+
For Python users and application developers, this change is invisible: no APIs
903+
change, no behaviour changes, and no user action is needed. The only
904+
observable effect is that profilers, debuggers, and system-level tracing tools
905+
produce more complete and more reliable results out of the box.
906+
907+
Though extensions should see negligible overhead, extension authors who observe a
908+
measurable regression in a specific module can opt out as described in
909+
`Extension Build Impact`_. The ``--without-frame-pointers`` configure flag is
910+
documented in `Opt-Out Configure Flag`_.
899911

900912

901913
Reference Implementation

0 commit comments

Comments
 (0)