Skip to content

[FLINK-39799][table] Preserve user-typed query text for materialized table and view definitions#28277

Open
raminqaf wants to merge 6 commits into
apache:masterfrom
raminqaf:FLINK-39799
Open

[FLINK-39799][table] Preserve user-typed query text for materialized table and view definitions#28277
raminqaf wants to merge 6 commits into
apache:masterfrom
raminqaf:FLINK-39799

Conversation

@raminqaf
Copy link
Copy Markdown
Contributor

@raminqaf raminqaf commented May 29, 2026

What is the purpose of the change

CREATE/ALTER MATERIALIZED TABLE … AS … and CREATE VIEW … AS … currently persist originalQuery by re-rendering the parsed SqlNode via SqlNode#toSqlString(). This normalizes identifier casing, expands type aliases (intINTEGER), reflows whitespace, and rewrites quoting, so SHOW CREATE MATERIALIZED TABLE / SHOW CREATE VIEW no longer reflect what the user typed.

This PR makes originalQuery the verbatim substring of the user's input statement, sliced via SqlParserPos of the AS-query subtree. expandedQuery continues to hold the planner-normalized form.

Example

Input:

CREATE MATERIALIZED TABLE mt AS
  select a, b, cast(c as int) as int_c from t1 where c > 200

Before:

SELECT `a`, `b`, CAST(`c` AS INTEGER) AS `int_c`
FROM `t1`
WHERE `c` > 200

after:

select a, b, cast(c as int) as int_c from t1 where c > 200

Brief Changelog

  • Thread original SQL statement text from ParserImpl through SqlNodeToOperationConversion.convert(...) into SqlNodeConvertContext.
  • Add ConvertContext#toRawSqlString(SqlNode) returning the substring of the original statement matching SqlParserPos. Counterpart to existing toQuotedSqlString(SqlNode). Falls back to toQuotedSqlString if raw statement text is unavailable.
  • Switch AbstractCreateMaterializedTableConverter#getDerivedOriginalQuery, SqlAlterMaterializedTableAsQueryConverter, and SqlNodeConvertUtils#toCatalogView (covers CREATE VIEW and ALTER VIEW … AS) to the new method.
  • Keep existing SqlNodeToOperationConversion.convert(...) signature as an overload that delegates with null raw text for backward compatibility.

Verifying this Change

  • Unit tests in SqlNodeConvertContextTest for toRawSqlString covering nested AS-query nodes, comments inside the query, and statements spanning multiple lines.
  • Planner tests in MaterializedTableTest and ViewTest asserting originalQuery equals the input substring across mixed casing, type aliases (int/varchar), whitespace, and quoting styles. Existing expandedQuery assertions retained unchanged.
  • Updated planner tests that previously asserted the normalized form of originalQuery.
  • SQL gateway round-trip test: create a materialized table / view with mixed-case query, run SHOW CREATE and assert the output preserves user wording.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no (signature change on SqlNodeToOperationConversion is @Internal; backward-compatible overload kept)
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no
  • Note: catalog entries persisted before this change still hold the normalized originalQuery. No migration performed.

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

Was generative AI tooling used to co-author this PR?

  • Yes - Claude Opus 4.7 to go through the implementation once and check for inconsistencies

@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented May 29, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

TableChange.add(Column.physical("f", DataTypes.VARCHAR(Integer.MAX_VALUE))),
TableChange.modifyDefinitionQuery(
"SELECT `a`, `b`, `c`, `d`, `d` AS `e`, CAST('123' AS STRING) AS `f`\nFROM `t1`",
"SELECT a, b, c, d, d as e, cast('123' as string) as f FROM t1",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have tests for queries containing different comments?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I have adde a parasitized test with for comments at different locations of the asQuery


public String getDefinitionQuery() {
return definitionQuery;
public String getExpandedQuery() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a @PublicEvolving interface, such change usually requires a FLIP

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should tolerate these type of changes after a minor release. For now reverted it get things unblocked.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a doc telling about this
in case we revert it, then should revert everywhere the change definitionQuery -> expandedQuery
otherwise in some places it is expanded, in other it is definition

}

@Override
public Optional<String> toRawSqlString(SqlNode node) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the implementation consistent?
what happen if

      final String outerStatement = "CREATE TABLE x AS SELECT a FROM t1";
      final FlinkPlannerImpl planner = getPlannerBySqlDialect(SqlDialect.DEFAULT);
      final CalciteParser parser = getParserBySqlDialect(SqlDialect.DEFAULT);
      final SqlNode innerAsQuery =
              ((SqlCreateTableAs) parser.parse(outerStatement)).getAsQuery();

new SqlNodeConvertContext(planner, catalogManager, outerStatement)
                      .toRawSqlString(innerAsQuery));

// AND
new SqlNodeConvertContext(planner, catalogManager, "SELECT a FROM t1")
                      .toRawSqlString(innerAsQuery));

will they have same result?

Copy link
Copy Markdown
Contributor Author

@raminqaf raminqaf May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and that's expected. I ran your exact snippet:

  outer statement "CREATE TABLE x AS SELECT a FROM t1"                                                                                             
  innerAsQuery.getParserPosition() -> line 1, column 19                                                                                            
                                                                                                                                                   
  new SqlNodeConvertContext(planner, catalogManager, outerStatement).toRawSqlString(innerAsQuery)                                                  
    -> Optional[SELECT a FROM t1]                                                                                                                  
                                                                                                                                                   
  new SqlNodeConvertContext(planner, catalogManager, "SELECT a FROM t1").toRawSqlString(innerAsQuery)                                              
    -> Optional.empty

the method has a precondition: the statement passed to the context must be the same text the node was parsed from. When that holds, the slice is correct. It fails safe. It never returns a wrong substring and never throws.

I will update the javaDocs with better explanation of this case.


return (PlannerQueryOperation)
SqlNodeToOperationConversion.convert(flinkPlanner, catalogManager, newSelect)
SqlNodeToOperationConversion.convert(flinkPlanner, catalogManager, newSelect, null)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it null here?
could it be potentially in the future same issue as with CTAS/RTAS?

Copy link
Copy Markdown
Contributor Author

@raminqaf raminqaf May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion path here never calls toRawString here. So it is safe. Even if it gets fired the newSelect is synthesized by rewriteCall, so the slicing would produce wrong values here. So the correct behavior would be to pass null to get an optional#empty

SqlNode asQuerySqlNode = sqlCreateTableAs.getAsQuery();
SqlNode validatedAsQuery = flinkPlanner.validate(asQuerySqlNode);

final String rawSqlString =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to have one var final while others are not?
can we make all final or all not final?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made everything final in this method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants