Skip to content

[docs] Add table-level active incremental warm-up guide#3829

Merged
bobhan1 merged 4 commits into
apache:masterfrom
bobhan1:codex/table-level-warmup-zh-doc
Jun 15, 2026
Merged

[docs] Add table-level active incremental warm-up guide#3829
bobhan1 merged 4 commits into
apache:masterfrom
bobhan1:codex/table-level-warmup-zh-doc

Conversation

@bobhan1

@bobhan1 bobhan1 commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

apache/doris#63832

  • Update the read/write separation File Cache warm-up guide in both English and Chinese with table-level event-driven warm-up usage.
  • Document ON TABLES syntax, INCLUDE/EXCLUDE matching rules, examples, refresh behavior, SHOW WARM UP JOB fields, detailed SyncStats JSON, BE Bvar metrics, and FE Prometheus metrics.
  • Clarify that compute-group-level load-event warm-up and table-level ON TABLES load-event warm-up should not be configured together for the same source and destination compute groups.

Validation

  • git diff --check
  • Front matter JSON parsing and Markdown code-fence/admonition pairing check

Note: Docusaurus/docs-governance checks were not run because this checkout does not have node_modules; the docs governance scripts fail on missing gray-matter.

@bobhan1 bobhan1 changed the title [codex] Add table-level warmup zh guide [docs](zh) Add table-level active incremental warm-up guide May 28, 2026
@bobhan1 bobhan1 changed the title [docs](zh) Add table-level active incremental warm-up guide [docs] Add table-level active incremental warm-up guide May 28, 2026
@bobhan1 bobhan1 marked this pull request as ready for review May 28, 2026 11:10
liaoxin01 pushed a commit to apache/doris that referenced this pull request Jun 15, 2026
### What problem does this PR solve?

Issue Number: None

Problem Summary:

This PR adds table-level event-driven cloud warm-up support and improves
active incremental warm-up progress observability.

Before this change, event-driven warm-up was only controlled at
compute-group granularity. Once a load-event warm-up job was enabled for
a source and target compute group pair, all source-side table writes
could trigger warm-up to the target compute group. That is inefficient
for workloads where only selected core tables, high-frequency query
tables, or selected async materialized views need to stay warm.

This PR lets users define the warm-up scope with `ON TABLES` when
creating an event-driven load warm-up job. FE persists the normalized
table filter in the warm-up job, resolves matched table ids dynamically,
sends the table ids to BE, and lets BE filter warm-up rowsets by table
id.

User-visible behavior:

- `WARM UP ... ON TABLES` supports table-level event-driven warm-up.
- Table filters support `INCLUDE` and `EXCLUDE` rules.
- Rules support `*` and `?` wildcards, for example `db.table`, `db.*`,
`*.orders_*`, and `log_db.log_?`.
- `INCLUDE` defines the candidate warm-up scope, and `EXCLUDE` removes
tables from that included scope.
- Rules are canonicalized before duplicate checks, so semantically
equivalent filters do not create duplicate jobs just because rule order
differs.
- Matching covers both regular OLAP tables and async materialized views.
- Matched table ids are refreshed as tables or async materialized views
are created, dropped, or renamed.
- The same source compute group can create independent table-level
warm-up jobs to different target compute groups with different table
filters.
- `SHOW WARM UP JOB` exposes the table-level job type, table filter,
matched tables, and SyncStats.
- `SHOW WARM UP JOB` list output keeps compact SyncStats, while
single-job lookup keeps detailed windowed SyncStats.

Example:

```sql
WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
    INCLUDE 'core_db.config',
    INCLUDE 'report_db.monthly_*',
    INCLUDE '*.sales_*',
    EXCLUDE '*.*_archive'
)
PROPERTIES (
    "sync_mode" = "event_driven",
    "sync_event" = "load"
);
```

Conflict and virtual compute group behavior:

- Table-level load-event warm-up and cluster-level load-event warm-up
are mutually exclusive for the same source and target compute group
pair.
- If a conflicting job already exists, creation returns an error that
includes the conflicting job id; table-level conflicts also include the
table filter.
- Duplicate checks within the same job type still follow the existing
duplicate-check logic.
- VCG-managed cluster-level load-event warm-up creation does not fail on
conflict. Because VCG jobs are created by the MS HTTP API path, FE
cancels existing table-level load-event warm-up jobs with the same
source and target first, then recreates the VCG-managed cluster-level
job.
- Manually creating a table-level load-event warm-up job is rejected
only when both source and target compute groups are owned by the same
VCG.
- SQL still cannot use a virtual compute group directly as the source or
target compute group.

Warm-up progress observation:

- BE records per-job windowed requested, finished, and failed warm-up
statistics.
- BE exposes per-job warm-up statistics through
`/api/warmup_event_driven_stats`.
- FE aggregates BE statistics and caches the aggregated result in the
warm-up job.
- SyncStats includes source-side and target-side warm-up size/count
progress across windows.
- SyncStats includes trigger-time progress, so users can observe whether
the target compute group is behind the latest source-side warm-up
trigger.
- FE `/metrics` exposes per-job active warm-up metadata, synchronized
size, and trigger gap metrics for cloud event-driven warm-up jobs.

### Release note

Support table-level event-driven cloud warm-up with `ON TABLES` filters
and per-job warm-up sync statistics.

### Check List (For Author)

- Test
    - [x] Regression test
    - [x] Unit Test
    - [x] Manual test
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason

- Behavior changed:
    - [ ] No.
- [x] Yes. `WARM UP` supports table-level `ON TABLES` filters for
event-driven load warm-up, and warm-up job output/metrics expose table
filter, matched tables, SyncStats, and trigger-gap information.

- Does this need documentation?
    - [ ] No.
    - [x] Yes. apache/doris-website#3829
@bobhan1 bobhan1 force-pushed the codex/table-level-warmup-zh-doc branch from 6d742f9 to e66a360 Compare June 15, 2026 10:41
@bobhan1 bobhan1 merged commit 6fbd07b into apache:master Jun 15, 2026
3 checks passed
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 17, 2026
Issue Number: None

Problem Summary:

This PR adds table-level event-driven cloud warm-up support and improves
active incremental warm-up progress observability.

Before this change, event-driven warm-up was only controlled at
compute-group granularity. Once a load-event warm-up job was enabled for
a source and target compute group pair, all source-side table writes
could trigger warm-up to the target compute group. That is inefficient
for workloads where only selected core tables, high-frequency query
tables, or selected async materialized views need to stay warm.

This PR lets users define the warm-up scope with `ON TABLES` when
creating an event-driven load warm-up job. FE persists the normalized
table filter in the warm-up job, resolves matched table ids dynamically,
sends the table ids to BE, and lets BE filter warm-up rowsets by table
id.

User-visible behavior:

- `WARM UP ... ON TABLES` supports table-level event-driven warm-up.
- Table filters support `INCLUDE` and `EXCLUDE` rules.
- Rules support `*` and `?` wildcards, for example `db.table`, `db.*`,
`*.orders_*`, and `log_db.log_?`.
- `INCLUDE` defines the candidate warm-up scope, and `EXCLUDE` removes
tables from that included scope.
- Rules are canonicalized before duplicate checks, so semantically
equivalent filters do not create duplicate jobs just because rule order
differs.
- Matching covers both regular OLAP tables and async materialized views.
- Matched table ids are refreshed as tables or async materialized views
are created, dropped, or renamed.
- The same source compute group can create independent table-level
warm-up jobs to different target compute groups with different table
filters.
- `SHOW WARM UP JOB` exposes the table-level job type, table filter,
matched tables, and SyncStats.
- `SHOW WARM UP JOB` list output keeps compact SyncStats, while
single-job lookup keeps detailed windowed SyncStats.

Example:

```sql
WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
    INCLUDE 'core_db.config',
    INCLUDE 'report_db.monthly_*',
    INCLUDE '*.sales_*',
    EXCLUDE '*.*_archive'
)
PROPERTIES (
    "sync_mode" = "event_driven",
    "sync_event" = "load"
);
```

Conflict and virtual compute group behavior:

- Table-level load-event warm-up and cluster-level load-event warm-up
are mutually exclusive for the same source and target compute group
pair.
- If a conflicting job already exists, creation returns an error that
includes the conflicting job id; table-level conflicts also include the
table filter.
- Duplicate checks within the same job type still follow the existing
duplicate-check logic.
- VCG-managed cluster-level load-event warm-up creation does not fail on
conflict. Because VCG jobs are created by the MS HTTP API path, FE
cancels existing table-level load-event warm-up jobs with the same
source and target first, then recreates the VCG-managed cluster-level
job.
- Manually creating a table-level load-event warm-up job is rejected
only when both source and target compute groups are owned by the same
VCG.
- SQL still cannot use a virtual compute group directly as the source or
target compute group.

Warm-up progress observation:

- BE records per-job windowed requested, finished, and failed warm-up
statistics.
- BE exposes per-job warm-up statistics through
`/api/warmup_event_driven_stats`.
- FE aggregates BE statistics and caches the aggregated result in the
warm-up job.
- SyncStats includes source-side and target-side warm-up size/count
progress across windows.
- SyncStats includes trigger-time progress, so users can observe whether
the target compute group is behind the latest source-side warm-up
trigger.
- FE `/metrics` exposes per-job active warm-up metadata, synchronized
size, and trigger gap metrics for cloud event-driven warm-up jobs.

Support table-level event-driven cloud warm-up with `ON TABLES` filters
and per-job warm-up sync statistics.

- Test
    - [x] Regression test
    - [x] Unit Test
    - [x] Manual test
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason

- Behavior changed:
    - [ ] No.
- [x] Yes. `WARM UP` supports table-level `ON TABLES` filters for
event-driven load warm-up, and warm-up job output/metrics expose table
filter, matched tables, SyncStats, and trigger-gap information.

- Does this need documentation?
    - [ ] No.
    - [x] Yes. apache/doris-website#3829
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 17, 2026
Issue Number: None

Problem Summary:

This PR adds table-level event-driven cloud warm-up support and improves
active incremental warm-up progress observability.

Before this change, event-driven warm-up was only controlled at
compute-group granularity. Once a load-event warm-up job was enabled for
a source and target compute group pair, all source-side table writes
could trigger warm-up to the target compute group. That is inefficient
for workloads where only selected core tables, high-frequency query
tables, or selected async materialized views need to stay warm.

This PR lets users define the warm-up scope with `ON TABLES` when
creating an event-driven load warm-up job. FE persists the normalized
table filter in the warm-up job, resolves matched table ids dynamically,
sends the table ids to BE, and lets BE filter warm-up rowsets by table
id.

User-visible behavior:

- `WARM UP ... ON TABLES` supports table-level event-driven warm-up.
- Table filters support `INCLUDE` and `EXCLUDE` rules.
- Rules support `*` and `?` wildcards, for example `db.table`, `db.*`,
`*.orders_*`, and `log_db.log_?`.
- `INCLUDE` defines the candidate warm-up scope, and `EXCLUDE` removes
tables from that included scope.
- Rules are canonicalized before duplicate checks, so semantically
equivalent filters do not create duplicate jobs just because rule order
differs.
- Matching covers both regular OLAP tables and async materialized views.
- Matched table ids are refreshed as tables or async materialized views
are created, dropped, or renamed.
- The same source compute group can create independent table-level
warm-up jobs to different target compute groups with different table
filters.
- `SHOW WARM UP JOB` exposes the table-level job type, table filter,
matched tables, and SyncStats.
- `SHOW WARM UP JOB` list output keeps compact SyncStats, while
single-job lookup keeps detailed windowed SyncStats.

Example:

```sql
WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
    INCLUDE 'core_db.config',
    INCLUDE 'report_db.monthly_*',
    INCLUDE '*.sales_*',
    EXCLUDE '*.*_archive'
)
PROPERTIES (
    "sync_mode" = "event_driven",
    "sync_event" = "load"
);
```

Conflict and virtual compute group behavior:

- Table-level load-event warm-up and cluster-level load-event warm-up
are mutually exclusive for the same source and target compute group
pair.
- If a conflicting job already exists, creation returns an error that
includes the conflicting job id; table-level conflicts also include the
table filter.
- Duplicate checks within the same job type still follow the existing
duplicate-check logic.
- VCG-managed cluster-level load-event warm-up creation does not fail on
conflict. Because VCG jobs are created by the MS HTTP API path, FE
cancels existing table-level load-event warm-up jobs with the same
source and target first, then recreates the VCG-managed cluster-level
job.
- Manually creating a table-level load-event warm-up job is rejected
only when both source and target compute groups are owned by the same
VCG.
- SQL still cannot use a virtual compute group directly as the source or
target compute group.

Warm-up progress observation:

- BE records per-job windowed requested, finished, and failed warm-up
statistics.
- BE exposes per-job warm-up statistics through
`/api/warmup_event_driven_stats`.
- FE aggregates BE statistics and caches the aggregated result in the
warm-up job.
- SyncStats includes source-side and target-side warm-up size/count
progress across windows.
- SyncStats includes trigger-time progress, so users can observe whether
the target compute group is behind the latest source-side warm-up
trigger.
- FE `/metrics` exposes per-job active warm-up metadata, synchronized
size, and trigger gap metrics for cloud event-driven warm-up jobs.

Support table-level event-driven cloud warm-up with `ON TABLES` filters
and per-job warm-up sync statistics.

- Test
    - [x] Regression test
    - [x] Unit Test
    - [x] Manual test
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason

- Behavior changed:
    - [ ] No.
- [x] Yes. `WARM UP` supports table-level `ON TABLES` filters for
event-driven load warm-up, and warm-up job output/metrics expose table
filter, matched tables, SyncStats, and trigger-gap information.

- Does this need documentation?
    - [ ] No.
    - [x] Yes. apache/doris-website#3829
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 17, 2026
Issue Number: None

Problem Summary:

This PR adds table-level event-driven cloud warm-up support and improves
active incremental warm-up progress observability.

Before this change, event-driven warm-up was only controlled at
compute-group granularity. Once a load-event warm-up job was enabled for
a source and target compute group pair, all source-side table writes
could trigger warm-up to the target compute group. That is inefficient
for workloads where only selected core tables, high-frequency query
tables, or selected async materialized views need to stay warm.

This PR lets users define the warm-up scope with `ON TABLES` when
creating an event-driven load warm-up job. FE persists the normalized
table filter in the warm-up job, resolves matched table ids dynamically,
sends the table ids to BE, and lets BE filter warm-up rowsets by table
id.

User-visible behavior:

- `WARM UP ... ON TABLES` supports table-level event-driven warm-up.
- Table filters support `INCLUDE` and `EXCLUDE` rules.
- Rules support `*` and `?` wildcards, for example `db.table`, `db.*`,
`*.orders_*`, and `log_db.log_?`.
- `INCLUDE` defines the candidate warm-up scope, and `EXCLUDE` removes
tables from that included scope.
- Rules are canonicalized before duplicate checks, so semantically
equivalent filters do not create duplicate jobs just because rule order
differs.
- Matching covers both regular OLAP tables and async materialized views.
- Matched table ids are refreshed as tables or async materialized views
are created, dropped, or renamed.
- The same source compute group can create independent table-level
warm-up jobs to different target compute groups with different table
filters.
- `SHOW WARM UP JOB` exposes the table-level job type, table filter,
matched tables, and SyncStats.
- `SHOW WARM UP JOB` list output keeps compact SyncStats, while
single-job lookup keeps detailed windowed SyncStats.

Example:

```sql
WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
    INCLUDE 'core_db.config',
    INCLUDE 'report_db.monthly_*',
    INCLUDE '*.sales_*',
    EXCLUDE '*.*_archive'
)
PROPERTIES (
    "sync_mode" = "event_driven",
    "sync_event" = "load"
);
```

Conflict and virtual compute group behavior:

- Table-level load-event warm-up and cluster-level load-event warm-up
are mutually exclusive for the same source and target compute group
pair.
- If a conflicting job already exists, creation returns an error that
includes the conflicting job id; table-level conflicts also include the
table filter.
- Duplicate checks within the same job type still follow the existing
duplicate-check logic.
- VCG-managed cluster-level load-event warm-up creation does not fail on
conflict. Because VCG jobs are created by the MS HTTP API path, FE
cancels existing table-level load-event warm-up jobs with the same
source and target first, then recreates the VCG-managed cluster-level
job.
- Manually creating a table-level load-event warm-up job is rejected
only when both source and target compute groups are owned by the same
VCG.
- SQL still cannot use a virtual compute group directly as the source or
target compute group.

Warm-up progress observation:

- BE records per-job windowed requested, finished, and failed warm-up
statistics.
- BE exposes per-job warm-up statistics through
`/api/warmup_event_driven_stats`.
- FE aggregates BE statistics and caches the aggregated result in the
warm-up job.
- SyncStats includes source-side and target-side warm-up size/count
progress across windows.
- SyncStats includes trigger-time progress, so users can observe whether
the target compute group is behind the latest source-side warm-up
trigger.
- FE `/metrics` exposes per-job active warm-up metadata, synchronized
size, and trigger gap metrics for cloud event-driven warm-up jobs.

Support table-level event-driven cloud warm-up with `ON TABLES` filters
and per-job warm-up sync statistics.

- Test
    - [x] Regression test
    - [x] Unit Test
    - [x] Manual test
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason

- Behavior changed:
    - [ ] No.
- [x] Yes. `WARM UP` supports table-level `ON TABLES` filters for
event-driven load warm-up, and warm-up job output/metrics expose table
filter, matched tables, SyncStats, and trigger-gap information.

- Does this need documentation?
    - [ ] No.
    - [x] Yes. apache/doris-website#3829
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jun 18, 2026
Issue Number: None

Problem Summary:

This PR adds table-level event-driven cloud warm-up support and improves
active incremental warm-up progress observability.

Before this change, event-driven warm-up was only controlled at
compute-group granularity. Once a load-event warm-up job was enabled for
a source and target compute group pair, all source-side table writes
could trigger warm-up to the target compute group. That is inefficient
for workloads where only selected core tables, high-frequency query
tables, or selected async materialized views need to stay warm.

This PR lets users define the warm-up scope with `ON TABLES` when
creating an event-driven load warm-up job. FE persists the normalized
table filter in the warm-up job, resolves matched table ids dynamically,
sends the table ids to BE, and lets BE filter warm-up rowsets by table
id.

User-visible behavior:

- `WARM UP ... ON TABLES` supports table-level event-driven warm-up.
- Table filters support `INCLUDE` and `EXCLUDE` rules.
- Rules support `*` and `?` wildcards, for example `db.table`, `db.*`,
`*.orders_*`, and `log_db.log_?`.
- `INCLUDE` defines the candidate warm-up scope, and `EXCLUDE` removes
tables from that included scope.
- Rules are canonicalized before duplicate checks, so semantically
equivalent filters do not create duplicate jobs just because rule order
differs.
- Matching covers both regular OLAP tables and async materialized views.
- Matched table ids are refreshed as tables or async materialized views
are created, dropped, or renamed.
- The same source compute group can create independent table-level
warm-up jobs to different target compute groups with different table
filters.
- `SHOW WARM UP JOB` exposes the table-level job type, table filter,
matched tables, and SyncStats.
- `SHOW WARM UP JOB` list output keeps compact SyncStats, while
single-job lookup keeps detailed windowed SyncStats.

Example:

```sql
WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
    INCLUDE 'core_db.config',
    INCLUDE 'report_db.monthly_*',
    INCLUDE '*.sales_*',
    EXCLUDE '*.*_archive'
)
PROPERTIES (
    "sync_mode" = "event_driven",
    "sync_event" = "load"
);
```

Conflict and virtual compute group behavior:

- Table-level load-event warm-up and cluster-level load-event warm-up
are mutually exclusive for the same source and target compute group
pair.
- If a conflicting job already exists, creation returns an error that
includes the conflicting job id; table-level conflicts also include the
table filter.
- Duplicate checks within the same job type still follow the existing
duplicate-check logic.
- VCG-managed cluster-level load-event warm-up creation does not fail on
conflict. Because VCG jobs are created by the MS HTTP API path, FE
cancels existing table-level load-event warm-up jobs with the same
source and target first, then recreates the VCG-managed cluster-level
job.
- Manually creating a table-level load-event warm-up job is rejected
only when both source and target compute groups are owned by the same
VCG.
- SQL still cannot use a virtual compute group directly as the source or
target compute group.

Warm-up progress observation:

- BE records per-job windowed requested, finished, and failed warm-up
statistics.
- BE exposes per-job warm-up statistics through
`/api/warmup_event_driven_stats`.
- FE aggregates BE statistics and caches the aggregated result in the
warm-up job.
- SyncStats includes source-side and target-side warm-up size/count
progress across windows.
- SyncStats includes trigger-time progress, so users can observe whether
the target compute group is behind the latest source-side warm-up
trigger.
- FE `/metrics` exposes per-job active warm-up metadata, synchronized
size, and trigger gap metrics for cloud event-driven warm-up jobs.

Support table-level event-driven cloud warm-up with `ON TABLES` filters
and per-job warm-up sync statistics.

- Test
    - [x] Regression test
    - [x] Unit Test
    - [x] Manual test
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason

- Behavior changed:
    - [ ] No.
- [x] Yes. `WARM UP` supports table-level `ON TABLES` filters for
event-driven load warm-up, and warm-up job output/metrics expose table
filter, matched tables, SyncStats, and trigger-gap information.

- Does this need documentation?
    - [ ] No.
    - [x] Yes. apache/doris-website#3829
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants