Skip to content

minor: add projectionTrace query context for diagnosing projection match failures#19613

Open
cecemei wants to merge 3 commits into
apache:masterfrom
cecemei:trace
Open

minor: add projectionTrace query context for diagnosing projection match failures#19613
cecemei wants to merge 3 commits into
apache:masterfrom
cecemei:trace

Conversation

@cecemei

@cecemei cecemei commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Description

Debugging why matchAggregateProjection returns null is difficult today because there are 14 distinct rejection sites spread across the matching pipeline, and there's no visibility into which one fired without a debugger or global DEBUG logging.

This PR adds a projectionTrace query context flag. When set to true, rejection reasons are logged at INFO level so they're visible without changing log4j config. When unset, the same messages are emitted at DEBUG level.

Usage:

  {                                                                                                                                                     
    "context": { "projectionTrace": true }                                                                                                              
  }                                                                                                                                                   

Rejection messages include the projection name and the specific reason (e.g., incompatible ordering, missing column, granularity mismatch), making it straightforward to identify which projection was rejected and why.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@cecemei cecemei marked this pull request as ready for review June 22, 2026 18:26
@cecemei cecemei requested a review from clintropolis June 22, 2026 18:26
public static final String NO_PROJECTIONS = "noProjections";
public static final String FORCE_PROJECTION = "forceProjections";
public static final String USE_PROJECTION = "useProjection";
public static final String PROJECTION_TRACE = "projectionTrace";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if instead of adding a new flag if this should just check ENABLE_DEBUG context flag

@cecemei cecemei Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've thought about that as well, but decided not because debug=true already triggers a lot of things at once, e.x SQL planner rule, CalciteRulesManager in particular is very noisy when it logs planner rules. It'd be very noisy if someone is just trying to understand why a projection isn't matching.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i considered the noise briefly too, but then said to myself it was ok though because the noisy SQL logs are all on the broker, while these logs are only going to be on the historicals and realtime servers, per segment.

If you really want to add a new flag, i still think we should also do this logging if debug is set, so that way someone wanting "all the logs" doesn't have to set a bunch of different flags, so the logTrace method could check either projectionTrace or debug

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i added the logging if debug is set

Comment thread processing/src/main/java/org/apache/druid/segment/projections/Projections.java Outdated
{
private static final Logger log = new Logger(Projections.class);

private static void logTrace(QueryContext context, String format, Object... args)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just pass the CursorBuildSpec into this method? it looks like all of the callers are getting it from there anyway

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a narrower input param would be more declarative? yes all callers are getting it from CursorBuildSpec.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it was just jarring to see logTrace(queryCursorBuildSpec.getQueryContext()... like 20 times in the same file with all the stuff sourced from a CursorBuildSpec.

// match virtual columns first, which will populate the 'remapColumns' of the match builder
matchBuilder = matchQueryVirtualColumns(projection, queryCursorBuildSpec, physicalColumnChecker, matchBuilder);
if (matchBuilder == null) {
logTrace(queryCursorBuildSpec.getQueryContext(), "matchAggregateProjection: projection [%s] rejected — virtual column mismatch", projection.getName());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and some of these other top level logs seem kind of redundant since the methods they call also log when returning nulls, but i guess we sort of need these here though because otherwise i'm not sure its obvious which caller of which match method was the failing one.

This is probably ok for now since its just logs, but in the future we might consider collecting these failure reasons instead of logging them when they happen so that we can produce more concise messaging.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could consider collecting failure reasons when forceProjection or useProjection is set in the future, and maybe add it as a system flag.

even though many messages are added here, only 1-2 lines would be logged so it should not be too verbose.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i think its fine for now, though 'only 1-2 lines' is per projection, per segment of the query, so its going to be pretty verbose anywhere that is touching more than a few segments if there are any projections at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea it would be suboptimal if logging for a lot of segments, maybe could be helpful in the case when only several segments out of all are missing proper projections, it gives some amount of visibility and operatibility to projection at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants