Diagnose log injection smoke test flakiness instead of masking it#11075
Open
Diagnose log injection smoke test flakiness instead of masking it#11075
Conversation
The `check raw file injection` test has been flaking across 11+ logging backend variants for months. CI Visibility data shows 90% of failures are `traceCount=0` at `waitForTraceCount(2)` after exactly 30s — the JVM + agent bytecode instrumentation simply takes >30s on overloaded CI machines. Changes: - Add `startupPoll` with 120s timeout for the initial `waitForTraceCount(2)` that covers JVM startup + agent init, giving 4x headroom over the current 30s `defaultPoll` - Add `waitForTraceCountAlive` that checks process liveness on each poll iteration, turning silent 30-120s timeouts into instant, actionable errors when the process crashes - Reorder `waitForTraceCount(4)` before `waitFor` to confirm all traces are delivered while the process is still alive - Assert `waitFor` return value for a clear error if the process hangs tag: no release note Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 59 metrics, 12 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~31ae072eeb, baseline=1.62.0-SNAPSHOT~caae0f79ce
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.057 s) : 0, 1057349
Total [baseline] (11.106 s) : 0, 11105561
Agent [candidate] (1.056 s) : 0, 1056492
Total [candidate] (11.15 s) : 0, 11149694
section appsec
Agent [baseline] (1.25 s) : 0, 1250355
Total [baseline] (11.15 s) : 0, 11150242
Agent [candidate] (1.252 s) : 0, 1252160
Total [candidate] (11.132 s) : 0, 11131514
section iast
Agent [baseline] (1.225 s) : 0, 1225317
Total [baseline] (11.338 s) : 0, 11338197
Agent [candidate] (1.237 s) : 0, 1237167
Total [candidate] (11.411 s) : 0, 11410820
section profiling
Agent [baseline] (1.186 s) : 0, 1186337
Total [baseline] (11.114 s) : 0, 11114498
Agent [candidate] (1.185 s) : 0, 1184524
Total [candidate] (11.09 s) : 0, 11090136
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~31ae072eeb, baseline=1.62.0-SNAPSHOT~caae0f79ce
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.237 ms) : 0, 1237
crashtracking [candidate] (1.247 ms) : 0, 1247
BytebuddyAgent [baseline] (632.102 ms) : 0, 632102
BytebuddyAgent [candidate] (632.192 ms) : 0, 632192
AgentMeter [baseline] (29.372 ms) : 0, 29372
AgentMeter [candidate] (29.366 ms) : 0, 29366
GlobalTracer [baseline] (248.545 ms) : 0, 248545
GlobalTracer [candidate] (248.853 ms) : 0, 248853
AppSec [baseline] (32.15 ms) : 0, 32150
AppSec [candidate] (32.451 ms) : 0, 32451
Debugger [baseline] (59.947 ms) : 0, 59947
Debugger [candidate] (60.045 ms) : 0, 60045
Remote Config [baseline] (597.056 µs) : 0, 597
Remote Config [candidate] (591.707 µs) : 0, 592
Telemetry [baseline] (8.055 ms) : 0, 8055
Telemetry [candidate] (8.045 ms) : 0, 8045
Flare Poller [baseline] (9.107 ms) : 0, 9107
Flare Poller [candidate] (7.447 ms) : 0, 7447
section appsec
crashtracking [baseline] (1.254 ms) : 0, 1254
crashtracking [candidate] (1.253 ms) : 0, 1253
BytebuddyAgent [baseline] (662.488 ms) : 0, 662488
BytebuddyAgent [candidate] (664.197 ms) : 0, 664197
AgentMeter [baseline] (12.095 ms) : 0, 12095
AgentMeter [candidate] (12.038 ms) : 0, 12038
GlobalTracer [baseline] (249.34 ms) : 0, 249340
GlobalTracer [candidate] (249.056 ms) : 0, 249056
AppSec [baseline] (185.544 ms) : 0, 185544
AppSec [candidate] (185.663 ms) : 0, 185663
Debugger [baseline] (65.672 ms) : 0, 65672
Debugger [candidate] (66.197 ms) : 0, 66197
Remote Config [baseline] (606.627 µs) : 0, 607
Remote Config [candidate] (609.855 µs) : 0, 610
Telemetry [baseline] (8.562 ms) : 0, 8562
Telemetry [candidate] (8.371 ms) : 0, 8371
Flare Poller [baseline] (3.567 ms) : 0, 3567
Flare Poller [candidate] (3.577 ms) : 0, 3577
IAST [baseline] (24.621 ms) : 0, 24621
IAST [candidate] (24.63 ms) : 0, 24630
section iast
crashtracking [baseline] (1.238 ms) : 0, 1238
crashtracking [candidate] (1.246 ms) : 0, 1246
BytebuddyAgent [baseline] (802.574 ms) : 0, 802574
BytebuddyAgent [candidate] (809.93 ms) : 0, 809930
AgentMeter [baseline] (11.403 ms) : 0, 11403
AgentMeter [candidate] (11.663 ms) : 0, 11663
GlobalTracer [baseline] (239.087 ms) : 0, 239087
GlobalTracer [candidate] (240.949 ms) : 0, 240949
AppSec [baseline] (32.619 ms) : 0, 32619
AppSec [candidate] (32.228 ms) : 0, 32228
Debugger [baseline] (60.631 ms) : 0, 60631
Debugger [candidate] (60.624 ms) : 0, 60624
Remote Config [baseline] (538.275 µs) : 0, 538
Remote Config [candidate] (558.985 µs) : 0, 559
Telemetry [baseline] (11.725 ms) : 0, 11725
Telemetry [candidate] (13.618 ms) : 0, 13618
Flare Poller [baseline] (3.461 ms) : 0, 3461
Flare Poller [candidate] (3.777 ms) : 0, 3777
IAST [baseline] (25.742 ms) : 0, 25742
IAST [candidate] (26.043 ms) : 0, 26043
section profiling
ProfilingAgent [baseline] (94.078 ms) : 0, 94078
ProfilingAgent [candidate] (93.869 ms) : 0, 93869
crashtracking [baseline] (1.189 ms) : 0, 1189
crashtracking [candidate] (1.177 ms) : 0, 1177
BytebuddyAgent [baseline] (692.232 ms) : 0, 692232
BytebuddyAgent [candidate] (691.684 ms) : 0, 691684
AgentMeter [baseline] (9.119 ms) : 0, 9119
AgentMeter [candidate] (9.117 ms) : 0, 9117
GlobalTracer [baseline] (207.613 ms) : 0, 207613
GlobalTracer [candidate] (207.145 ms) : 0, 207145
AppSec [baseline] (32.882 ms) : 0, 32882
AppSec [candidate] (32.84 ms) : 0, 32840
Debugger [baseline] (65.882 ms) : 0, 65882
Debugger [candidate] (65.498 ms) : 0, 65498
Remote Config [baseline] (582.872 µs) : 0, 583
Remote Config [candidate] (571.224 µs) : 0, 571
Telemetry [baseline] (7.807 ms) : 0, 7807
Telemetry [candidate] (7.741 ms) : 0, 7741
Flare Poller [baseline] (3.584 ms) : 0, 3584
Flare Poller [candidate] (3.541 ms) : 0, 3541
Profiling [baseline] (94.646 ms) : 0, 94646
Profiling [candidate] (94.435 ms) : 0, 94435
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~31ae072eeb, baseline=1.62.0-SNAPSHOT~caae0f79ce
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.059 s) : 0, 1058669
Total [baseline] (8.912 s) : 0, 8912376
Agent [candidate] (1.061 s) : 0, 1060736
Total [candidate] (8.873 s) : 0, 8872752
section iast
Agent [baseline] (1.231 s) : 0, 1230772
Total [baseline] (9.573 s) : 0, 9573228
Agent [candidate] (1.239 s) : 0, 1239493
Total [candidate] (9.584 s) : 0, 9584489
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~31ae072eeb, baseline=1.62.0-SNAPSHOT~caae0f79ce
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.259 ms) : 0, 1259
crashtracking [candidate] (1.235 ms) : 0, 1235
BytebuddyAgent [baseline] (634.722 ms) : 0, 634722
BytebuddyAgent [candidate] (633.303 ms) : 0, 633303
AgentMeter [baseline] (29.526 ms) : 0, 29526
AgentMeter [candidate] (29.354 ms) : 0, 29354
GlobalTracer [baseline] (249.295 ms) : 0, 249295
GlobalTracer [candidate] (249.171 ms) : 0, 249171
AppSec [baseline] (32.325 ms) : 0, 32325
AppSec [candidate] (32.387 ms) : 0, 32387
Debugger [baseline] (59.238 ms) : 0, 59238
Debugger [candidate] (59.117 ms) : 0, 59117
Remote Config [baseline] (594.486 µs) : 0, 594
Remote Config [candidate] (600.631 µs) : 0, 601
Telemetry [baseline] (8.068 ms) : 0, 8068
Telemetry [candidate] (8.0 ms) : 0, 8000
Flare Poller [baseline] (7.378 ms) : 0, 7378
Flare Poller [candidate] (11.211 ms) : 0, 11211
section iast
crashtracking [baseline] (1.249 ms) : 0, 1249
crashtracking [candidate] (1.246 ms) : 0, 1246
BytebuddyAgent [baseline] (808.113 ms) : 0, 808113
BytebuddyAgent [candidate] (813.449 ms) : 0, 813449
AgentMeter [baseline] (11.646 ms) : 0, 11646
AgentMeter [candidate] (11.803 ms) : 0, 11803
GlobalTracer [baseline] (239.187 ms) : 0, 239187
GlobalTracer [candidate] (240.897 ms) : 0, 240897
AppSec [baseline] (31.964 ms) : 0, 31964
AppSec [candidate] (32.21 ms) : 0, 32210
Debugger [baseline] (60.241 ms) : 0, 60241
Debugger [candidate] (61.995 ms) : 0, 61995
Remote Config [baseline] (539.727 µs) : 0, 540
Remote Config [candidate] (544.723 µs) : 0, 545
Telemetry [baseline] (12.238 ms) : 0, 12238
Telemetry [candidate] (11.243 ms) : 0, 11243
Flare Poller [baseline] (3.384 ms) : 0, 3384
Flare Poller [candidate] (3.462 ms) : 0, 3462
IAST [baseline] (25.823 ms) : 0, 25823
IAST [candidate] (25.974 ms) : 0, 25974
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 4 performance regressions! Performance is the same for 15 metrics, 17 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~31ae072eeb, baseline=1.62.0-SNAPSHOT~caae0f79ce
dateFormat X
axisFormat %s
section baseline
no_agent (18.553 ms) : 18361, 18744
. : milestone, 18553,
appsec (19.974 ms) : 19768, 20181
. : milestone, 19974,
code_origins (17.956 ms) : 17780, 18132
. : milestone, 17956,
iast (17.723 ms) : 17552, 17894
. : milestone, 17723,
profiling (18.261 ms) : 18081, 18442
. : milestone, 18261,
tracing (17.616 ms) : 17442, 17790
. : milestone, 17616,
section candidate
no_agent (18.748 ms) : 18556, 18939
. : milestone, 18748,
appsec (19.303 ms) : 19108, 19497
. : milestone, 19303,
code_origins (19.188 ms) : 18996, 19381
. : milestone, 19188,
iast (18.379 ms) : 18195, 18564
. : milestone, 18379,
profiling (18.426 ms) : 18241, 18610
. : milestone, 18426,
tracing (17.987 ms) : 17811, 18164
. : milestone, 17987,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~31ae072eeb, baseline=1.62.0-SNAPSHOT~caae0f79ce
dateFormat X
axisFormat %s
section baseline
no_agent (1.242 ms) : 1230, 1254
. : milestone, 1242,
iast (3.357 ms) : 3309, 3406
. : milestone, 3357,
iast_FULL (6.141 ms) : 6077, 6205
. : milestone, 6141,
iast_GLOBAL (3.659 ms) : 3598, 3719
. : milestone, 3659,
profiling (2.184 ms) : 2163, 2205
. : milestone, 2184,
tracing (1.904 ms) : 1888, 1920
. : milestone, 1904,
section candidate
no_agent (1.235 ms) : 1224, 1246
. : milestone, 1235,
iast (3.273 ms) : 3226, 3321
. : milestone, 3273,
iast_FULL (5.995 ms) : 5933, 6056
. : milestone, 5995,
iast_GLOBAL (3.61 ms) : 3556, 3665
. : milestone, 3610,
profiling (2.155 ms) : 2132, 2178
. : milestone, 2155,
tracing (1.881 ms) : 1865, 1897
. : milestone, 1881,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~31ae072eeb, baseline=1.62.0-SNAPSHOT~caae0f79ce
dateFormat X
axisFormat %s
section baseline
no_agent (1.483 ms) : 1472, 1495
. : milestone, 1483,
appsec (3.782 ms) : 3565, 3999
. : milestone, 3782,
iast (2.266 ms) : 2196, 2335
. : milestone, 2266,
iast_GLOBAL (2.278 ms) : 2208, 2347
. : milestone, 2278,
profiling (2.103 ms) : 2047, 2158
. : milestone, 2103,
tracing (2.077 ms) : 2023, 2130
. : milestone, 2077,
section candidate
no_agent (1.485 ms) : 1473, 1497
. : milestone, 1485,
appsec (3.824 ms) : 3603, 4045
. : milestone, 3824,
iast (2.262 ms) : 2193, 2331
. : milestone, 2262,
iast_GLOBAL (2.306 ms) : 2236, 2376
. : milestone, 2306,
profiling (2.096 ms) : 2041, 2151
. : milestone, 2096,
tracing (2.084 ms) : 2030, 2139
. : milestone, 2084,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~31ae072eeb, baseline=1.62.0-SNAPSHOT~caae0f79ce
dateFormat X
axisFormat %s
section baseline
no_agent (14.939 s) : 14939000, 14939000
. : milestone, 14939000,
appsec (14.51 s) : 14510000, 14510000
. : milestone, 14510000,
iast (18.062 s) : 18062000, 18062000
. : milestone, 18062000,
iast_GLOBAL (18.05 s) : 18050000, 18050000
. : milestone, 18050000,
profiling (14.778 s) : 14778000, 14778000
. : milestone, 14778000,
tracing (14.906 s) : 14906000, 14906000
. : milestone, 14906000,
section candidate
no_agent (15.668 s) : 15668000, 15668000
. : milestone, 15668000,
appsec (14.722 s) : 14722000, 14722000
. : milestone, 14722000,
iast (18.433 s) : 18433000, 18433000
. : milestone, 18433000,
iast_GLOBAL (17.983 s) : 17983000, 17983000
. : milestone, 17983000,
profiling (15.262 s) : 15262000, 15262000
. : milestone, 15262000,
tracing (14.924 s) : 14924000, 14924000
. : milestone, 14924000,
|
The `check raw file injection` test flakes across 11+ logging backend variants. CI Visibility data shows the failure is bimodal — successful runs complete in 3-9s, but failures sit at exactly 30s (the PollingConditions timeout) with traceCount=0. Nothing in between. This means the process either works or is totally broken — no amount of timeout increase will help. The current test is blind during the 30s wait — it just polls traceCount with no diagnostics when the process crashes or hangs. Changes: - Add `waitForTraceCountAlive` that checks process liveness on every poll iteration. If the process dies, it fails immediately with the exit code, RC poll count, and last 20 lines of process output. - On timeout, enrich the error with diagnostic state (process alive?, traceCount, RC polls received, last 30 lines of output) so the next CI failure tells us whether it's a crash, a hang, or a connectivity issue. - Reorder `waitForTraceCount(4)` before `waitFor` to confirm all traces are delivered while the process is still alive. - Assert `waitFor` return value for a clear error if the process hangs. tag: no release notes Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The liveness check fired before the trace count check, so a normal process exit after delivering all traces was treated as a failure. Check traceCount >= count first and return early if satisfied. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
PollingConditions.eventually only retries AssertionError. The liveness check was throwing AssertionError, so a dead process still waited the full 30s timeout. Switch to RuntimeException so it propagates immediately. Also narrow the catch from Throwable to AssertionError. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mhlidd
approved these changes
Apr 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Does This Do
Adds diagnostic instrumentation to the
check raw file injectionsmoke test so the next CI failure tells us the root cause instead of a bare "Condition not satisfied after 30s" withtraceCount=0.Changes to
LogInjectionSmokeTest:waitForTraceCountAlive— checks process liveness on every poll iteration; if the process dies, fails immediately with exit code + last 20 lines of process outputwaitForTraceCount(4)beforewaitFor+ assertwaitForreturn valueMotivation
CI Visibility data for the last 30 days on master shows 10 failures of
check raw file injection:traceCount=0atwaitForTraceCount(2)logLines.size()=3atassertRawLogLinesWithInjectionThe failure distribution is bimodal — successful runs complete in 3.5-8.7s (80 data points, zero above 9s), while failures sit at exactly 30.3s. There is nothing in between. This means the process either works or is totally broken — a timeout increase would just delay the same failure.
The current test is blind during the wait — it just polls
traceCountin a loop. We don't know if the process crashed, hung during agent init, failed to connect to the test server, or something else entirely. This PR makes the next failure self-diagnosing.Example output when process crashes:
Example output on timeout (process alive but not sending traces):
Additional Notes
LogInjectionSmokeTest.groovyis changeddefaultPollis kept as-isrcClientMessages.size()tells us whether the agent connected to the test server at all (RC polls hit/v0.7/configevery 200ms)Contributor Checklist
type:and (comp:orinst:) labelsclose,fix, or any linking keywords when referencing an issuetag: no release notes
tag: ai generated
🤖 Generated with Claude Code