[CALCITE-7618] Add filter pushdown support to the file adapter's CSV table implementation by Diveyam-Mishra · Pull Request #5048 · apache/calcite

Diveyam-Mishra · 2026-06-24T21:23:03Z

Jira Link

Changes Proposed

This PR implements filter pushdown support for the file adapter's CSV table using a planner-rule-based approach instead of a FilterableTable interface. This allows Calcite to make more intelligent planning decisions, estimate cost reductions, and display pushed-down predicates in EXPLAIN plans.

Implementation Details:

Rule-Based Pushdown:
- Introduced CsvFilterTableScanRule which matches LogicalFilter on a CsvTableScan and pushes simple equality predicates (col = literal) into the scan.
- Introduced CsvProjectFilterTableScanRule which matches LogicalProject → LogicalFilter → CsvTableScan and pushes down the filter first, preventing the planner from prematurely collapsing projects and filters into a generic EnumerableCalc and bypassing pushdown.
Scan State & Costing:
- Updated CsvTableScan to store and propagate @Nullable String[] filterValues.
- Updated CsvTableScan#computeSelfCost to reduce planning cost proportionally to the number of pushed-down filters.
- Extended CsvTableScan#explainTerms to format filters as filters=[[colIndex=value]] in EXPLAIN outputs.
Execution Support:
- Added CsvTranslatableTable#scan(DataContext, int[], String[]) which is dynamically invoked by the generated code when filters are present.
- Made CsvEnumerator#converter package-private so it can be reused inside CsvTranslatableTable to resolve correct row converters (ensuring single-column projections return raw objects rather than Object[] arrays to prevent class cast errors).
Testing:
- Added target unit tests in FileAdapterTest.java verifying pushdown, projection combination, result correctness, and non-pushable residual filter persistence.
- Updated existing plans in testPushDownProjectAggregateWithFilter to reflect the newly optimized scan plans.

To verify the change, run:

.\sqlline.bat -u "jdbc:calcite:model=file/src/test/resources/smart.json" -n admin -p admin -e "!set maxwidth 10000" -e "explain plan for select name, empno from EMPS where deptno = 20"

Before this change, the plan was:

PLAN=EnumerableCalc(expr#0..2=[{inputs}], expr#3=[20], expr#4=[=($t2, $t3)], NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])

After this change, the filter and projection are pushed down into CsvTableScan, resulting in:

CsvTableScan(table=[[SALES, EMPS]], fields=[[1, 0]], filters=[[2=20]])

This demonstrates that the scan now reads only the required columns (name, empno) and applies the deptno = 20 filter during the table scan itself.

mihaibudiu · 2026-06-24T22:04:29Z

+
+  protected CsvTableScan(RelOptCluster cluster, RelOptTable table,
+      CsvTranslatableTable csvTable, int[] fields,
+      @Nullable String @Nullable [] filterValues) {


I think CsvEnumerator is actually broken, since it does string comparisons.
This means for example that 0.0 != 0 in a filter.

…table implementation

sonarqubecloud · 2026-06-27T18:59:06Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
88.3% Coverage on New Code
41.5% Duplication on New Code

See analysis details on SonarQube Cloud

Diveyam-Mishra · 2026-06-27T19:27:10Z

I might have complicated a few things because I was getting some Style errors constantly on local which i tried to fix but idk maybe was doing something wrong i tried stopping daemon thread and rebuild yet something went haywire So If its needed i can open a new PR with single proper commit

mihaibudiu · 2026-06-27T22:08:39Z

Please use fresh commits until we finish the review, to make it easier to see what changed in response to reviewers.

mihaibudiu · 2026-06-27T22:33:18Z

+    if (o1 == null || o2 == null) {
+      return false;
+    }
+    if (o1 instanceof BigDecimal && o2 instanceof BigDecimal) {


Why is this case needed? Doesn't BigDecimal have equals?
If it does, can this become Objects.equals()?

The core problem is that BigDecimal violates the intuitive expectation that "same number = equal object":
new BigDecimal("2.0").equals(new BigDecimal("2.00")) // false
new BigDecimal("2.0").compareTo(new BigDecimal("2.00")) == 0 // true

How about using compareTo for everything and using Comaprable for o1 and o2?

mihaibudiu · 2026-06-27T22:36:14Z

+ * {@link CsvTableScan}.
+ *
+ * <p>Only equality conditions of the form {@code column = literal} can be
+ * pushed down, because {@link CsvEnumerator} only supports per-column


Could this situation be improved? Is this a fundamental limitation of CsvEnumerator?
Maybe we need a more powerful enumerator.
In principle I think any predicate of the current row value should work.

My current plan is to introduce a CsvFilter abstraction to represent the subset of filters that can be pushed down (initially AND, OR, = and <>, including null comparisons). Rather than encoding pushdown state as column-value arrays, the planner will build a CsvFilter tree, serialize it, and pass the serialized representation through CsvTableScan/CsvTranslatableTable to CsvEnumerator, where it will be deserialized and evaluated against each row.

The CsvFilter classes are intended to be a lightweight data model representing pushdownable predicates, while evaluation, serialization/deserialization, and pretty-printing remain separate concerns. This keeps the representation extensible for additional pushdown operators in the future without requiring further changes to the transport mechanism between planning and execution.
There is one more option which is to do exactly what spark does compile the filter all the way down to actual bytecode but in my opinion thats a bit overkill

Calcite already includes a compiler which generates the enumerable code, why can't the same compiler generate the filter implementation as a compiled Java function? Then you can support arbitrary functions.

mihaibudiu · 2026-06-27T22:38:34Z

    sql("model-with-custom-table", sql).ok();
  }

+  /** Test case for


It would be nice to higher a higher coverage in terms of SQL types for columns.

Copilot

Pull request overview

This PR adds planner-rule-based filter pushdown for the file adapter’s CSV tables by carrying a pushed-down filter condition inside CsvTableScan and compiling it into a runtime predicate during enumerable implementation. It also strengthens CSV row conversion and expands tests to validate predicate behavior (including null-handling and short rows).

Changes:

Added CsvFilterTableScanRule and CsvProjectFilterTableScanRule, and registered them via FileRules / CsvTableScan#register.
Extended CsvTableScan to carry a @Nullable RexNode condition, emit it in EXPLAIN, adjust costing, and apply it via a compiled Predicate1 in implement.
Updated CsvEnumerator conversion and added tests around missing fields and equality/null semantics, plus additional plan/result coverage in adapter/example tests.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
file/src/test/java/org/apache/calcite/adapter/file/FileAdapterTest.java	Adds/updates tests for pushdown behavior, plans, and null/equality semantics.
file/src/test/java/org/apache/calcite/adapter/file/CsvEnumeratorTest.java	Adds a test covering conversion when CSV rows are shorter than the projected schema.
file/src/main/java/org/apache/calcite/adapter/file/FileRules.java	Registers new planner rules and documents their intent.
file/src/main/java/org/apache/calcite/adapter/file/CsvTableScan.java	Stores pushed-down filter condition and applies it during enumerable implementation.
file/src/main/java/org/apache/calcite/adapter/file/CsvProjectTableScanRule.java	Adjusts projection pushdown mapping through existing `scan.fields` and adds a condition guard.
file/src/main/java/org/apache/calcite/adapter/file/CsvProjectFilterTableScanRule.java	New rule to push filter into scan and remap input refs for project/filter when combined.
file/src/main/java/org/apache/calcite/adapter/file/CsvFilterTableScanRule.java	New rule to push `LogicalFilter` condition into `CsvTableScan`.
file/src/main/java/org/apache/calcite/adapter/file/CsvEnumerator.java	Makes converter reusable, adds safer field access, adds `objectsEqual`, and modifies filter evaluation loop.
example/csv/src/test/java/org/apache/calcite/test/CsvTest.java	Adds example tests validating equality semantics with nulls under filterable model.

Comments suppressed due to low confidence (1)

file/src/main/java/org/apache/calcite/adapter/file/CsvEnumerator.java:318

Filtering uses strings[i] while iterating up to filterValues.size(). If a CSV row has fewer columns than the schema (which this PR now explicitly supports via field(strings, idx)), this will throw ArrayIndexOutOfBoundsException during filtering. Use the safe field(...) accessor (and treat missing fields as non-matching when a filter value is required).

        if (filterValues != null) {
          for (int i = 0; i < filterValues.size(); i++) {
            String filterValue = filterValues.get(i);
            if (filterValue != null) {
              if (!filterValue.equals(strings[i])) {
                continue outer;
              }
            }
          }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+          RexToLixTranslator.translateCondition(
+              program,
+              implementor.getTypeFactory(),
+              builder,
+              inputGetter,
+              null,
+              implementor.getConformance());


+  /** Rule that matches a {@link org.apache.calcite.rel.core.Filter} on
+   * a {@link CsvTableScan} and pushes arbitrary predicates into the scan.
+   * Any {@link org.apache.calcite.rex.RexNode} condition is compiled at plan
+   * time via {@link org.apache.calcite.adapter.enumerable.RexToLixTranslator}
+   * into a {@link org.apache.calcite.linq4j.function.Predicate1}. */
+  public static final CsvFilterTableScanRule FILTER_SCAN =
+      CsvFilterTableScanRule.Config.DEFAULT.toRule();
+
+  /** Rule that matches a {@link org.apache.calcite.rel.core.Project} on
+   * a {@link org.apache.calcite.rel.core.Filter} on a {@link CsvTableScan}
+   * and pushes down simple equality predicates. */
+  public static final CsvProjectFilterTableScanRule PROJECT_FILTER_SCAN =
+      CsvProjectFilterTableScanRule.Config.DEFAULT.toRule();


+  @Test void testNonPushableFilterRemains() {
+    // empno > 110 is a range filter; under the compiler-based filter pushdown
+    // it is pushed down into the scan, leaving only the projection on top.
+    final String sql = "select name from EMPS where empno > 110";


Diveyam-Mishra force-pushed the CALCITE-7618 branch 2 times, most recently from 66fb2ac to e0535c1 Compare June 24, 2026 21:37

mihaibudiu reviewed Jun 24, 2026

View reviewed changes

Diveyam-Mishra marked this pull request as draft June 24, 2026 22:30

Diveyam-Mishra force-pushed the CALCITE-7618 branch 3 times, most recently from f60ce88 to d5d601a Compare June 27, 2026 18:29

[CALCITE-7618] Add filter pushdown support to the file adapter's CSV …

3fd66a7

…table implementation

Diveyam-Mishra force-pushed the CALCITE-7618 branch from d5d601a to 3fd66a7 Compare June 27, 2026 18:43

Diveyam-Mishra marked this pull request as ready for review June 27, 2026 19:24

Diveyam-Mishra requested a review from mihaibudiu June 27, 2026 19:25

mihaibudiu reviewed Jun 27, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings July 2, 2026 22:11

Copilot started reviewing on behalf of Diveyam-Mishra July 2, 2026 22:12 View session

Copilot AI reviewed Jul 2, 2026

View reviewed changes

Diveyam-Mishra force-pushed the CALCITE-7618 branch from 7b22d7a to 49b2744 Compare July 2, 2026 22:26

Extending Test Coverage and Support arbitrary filter predicates

6a8ab33

Diveyam-Mishra force-pushed the CALCITE-7618 branch from 49b2744 to 6a8ab33 Compare July 2, 2026 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CALCITE-7618] Add filter pushdown support to the file adapter's CSV table implementation#5048

[CALCITE-7618] Add filter pushdown support to the file adapter's CSV table implementation#5048
Diveyam-Mishra wants to merge 2 commits into
apache:mainfrom
Diveyam-Mishra:CALCITE-7618

Diveyam-Mishra commented Jun 24, 2026

Uh oh!

mihaibudiu Jun 24, 2026

Uh oh!

sonarqubecloud Bot commented Jun 27, 2026

Uh oh!

Diveyam-Mishra commented Jun 27, 2026

Uh oh!

mihaibudiu commented Jun 27, 2026

Uh oh!

mihaibudiu Jun 27, 2026

Uh oh!

Diveyam-Mishra Jun 29, 2026 •

edited

Loading

Uh oh!

mihaibudiu Jun 30, 2026

Uh oh!

mihaibudiu Jun 27, 2026

Uh oh!

Diveyam-Mishra Jun 29, 2026

Uh oh!

mihaibudiu Jun 30, 2026

Uh oh!

mihaibudiu Jun 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Diveyam-Mishra commented Jun 24, 2026

Jira Link

Changes Proposed

Implementation Details:

Uh oh!

mihaibudiu Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 27, 2026

Quality Gate passed

Uh oh!

Diveyam-Mishra commented Jun 27, 2026

Uh oh!

mihaibudiu commented Jun 27, 2026

Uh oh!

mihaibudiu Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Diveyam-Mishra Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mihaibudiu Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

mihaibudiu Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Diveyam-Mishra Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

mihaibudiu Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

mihaibudiu Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Diveyam-Mishra Jun 29, 2026 •

edited

Loading