[docs] Add table-level active incremental warm-up guide#3829
Merged
Conversation
16 tasks
gavinchou
approved these changes
Jun 1, 2026
liaoxin01
pushed a commit
to apache/doris
that referenced
this pull request
Jun 15, 2026
### What problem does this PR solve?
Issue Number: None
Problem Summary:
This PR adds table-level event-driven cloud warm-up support and improves
active incremental warm-up progress observability.
Before this change, event-driven warm-up was only controlled at
compute-group granularity. Once a load-event warm-up job was enabled for
a source and target compute group pair, all source-side table writes
could trigger warm-up to the target compute group. That is inefficient
for workloads where only selected core tables, high-frequency query
tables, or selected async materialized views need to stay warm.
This PR lets users define the warm-up scope with `ON TABLES` when
creating an event-driven load warm-up job. FE persists the normalized
table filter in the warm-up job, resolves matched table ids dynamically,
sends the table ids to BE, and lets BE filter warm-up rowsets by table
id.
User-visible behavior:
- `WARM UP ... ON TABLES` supports table-level event-driven warm-up.
- Table filters support `INCLUDE` and `EXCLUDE` rules.
- Rules support `*` and `?` wildcards, for example `db.table`, `db.*`,
`*.orders_*`, and `log_db.log_?`.
- `INCLUDE` defines the candidate warm-up scope, and `EXCLUDE` removes
tables from that included scope.
- Rules are canonicalized before duplicate checks, so semantically
equivalent filters do not create duplicate jobs just because rule order
differs.
- Matching covers both regular OLAP tables and async materialized views.
- Matched table ids are refreshed as tables or async materialized views
are created, dropped, or renamed.
- The same source compute group can create independent table-level
warm-up jobs to different target compute groups with different table
filters.
- `SHOW WARM UP JOB` exposes the table-level job type, table filter,
matched tables, and SyncStats.
- `SHOW WARM UP JOB` list output keeps compact SyncStats, while
single-job lookup keeps detailed windowed SyncStats.
Example:
```sql
WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
INCLUDE 'core_db.config',
INCLUDE 'report_db.monthly_*',
INCLUDE '*.sales_*',
EXCLUDE '*.*_archive'
)
PROPERTIES (
"sync_mode" = "event_driven",
"sync_event" = "load"
);
```
Conflict and virtual compute group behavior:
- Table-level load-event warm-up and cluster-level load-event warm-up
are mutually exclusive for the same source and target compute group
pair.
- If a conflicting job already exists, creation returns an error that
includes the conflicting job id; table-level conflicts also include the
table filter.
- Duplicate checks within the same job type still follow the existing
duplicate-check logic.
- VCG-managed cluster-level load-event warm-up creation does not fail on
conflict. Because VCG jobs are created by the MS HTTP API path, FE
cancels existing table-level load-event warm-up jobs with the same
source and target first, then recreates the VCG-managed cluster-level
job.
- Manually creating a table-level load-event warm-up job is rejected
only when both source and target compute groups are owned by the same
VCG.
- SQL still cannot use a virtual compute group directly as the source or
target compute group.
Warm-up progress observation:
- BE records per-job windowed requested, finished, and failed warm-up
statistics.
- BE exposes per-job warm-up statistics through
`/api/warmup_event_driven_stats`.
- FE aggregates BE statistics and caches the aggregated result in the
warm-up job.
- SyncStats includes source-side and target-side warm-up size/count
progress across windows.
- SyncStats includes trigger-time progress, so users can observe whether
the target compute group is behind the latest source-side warm-up
trigger.
- FE `/metrics` exposes per-job active warm-up metadata, synchronized
size, and trigger gap metrics for cloud event-driven warm-up jobs.
### Release note
Support table-level event-driven cloud warm-up with `ON TABLES` filters
and per-job warm-up sync statistics.
### Check List (For Author)
- Test
- [x] Regression test
- [x] Unit Test
- [x] Manual test
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason
- Behavior changed:
- [ ] No.
- [x] Yes. `WARM UP` supports table-level `ON TABLES` filters for
event-driven load warm-up, and warm-up job output/metrics expose table
filter, matched tables, SyncStats, and trigger-gap information.
- Does this need documentation?
- [ ] No.
- [x] Yes. apache/doris-website#3829
6d742f9 to
e66a360
Compare
bobhan1
added a commit
to bobhan1/doris
that referenced
this pull request
Jun 17, 2026
Issue Number: None
Problem Summary:
This PR adds table-level event-driven cloud warm-up support and improves
active incremental warm-up progress observability.
Before this change, event-driven warm-up was only controlled at
compute-group granularity. Once a load-event warm-up job was enabled for
a source and target compute group pair, all source-side table writes
could trigger warm-up to the target compute group. That is inefficient
for workloads where only selected core tables, high-frequency query
tables, or selected async materialized views need to stay warm.
This PR lets users define the warm-up scope with `ON TABLES` when
creating an event-driven load warm-up job. FE persists the normalized
table filter in the warm-up job, resolves matched table ids dynamically,
sends the table ids to BE, and lets BE filter warm-up rowsets by table
id.
User-visible behavior:
- `WARM UP ... ON TABLES` supports table-level event-driven warm-up.
- Table filters support `INCLUDE` and `EXCLUDE` rules.
- Rules support `*` and `?` wildcards, for example `db.table`, `db.*`,
`*.orders_*`, and `log_db.log_?`.
- `INCLUDE` defines the candidate warm-up scope, and `EXCLUDE` removes
tables from that included scope.
- Rules are canonicalized before duplicate checks, so semantically
equivalent filters do not create duplicate jobs just because rule order
differs.
- Matching covers both regular OLAP tables and async materialized views.
- Matched table ids are refreshed as tables or async materialized views
are created, dropped, or renamed.
- The same source compute group can create independent table-level
warm-up jobs to different target compute groups with different table
filters.
- `SHOW WARM UP JOB` exposes the table-level job type, table filter,
matched tables, and SyncStats.
- `SHOW WARM UP JOB` list output keeps compact SyncStats, while
single-job lookup keeps detailed windowed SyncStats.
Example:
```sql
WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
INCLUDE 'core_db.config',
INCLUDE 'report_db.monthly_*',
INCLUDE '*.sales_*',
EXCLUDE '*.*_archive'
)
PROPERTIES (
"sync_mode" = "event_driven",
"sync_event" = "load"
);
```
Conflict and virtual compute group behavior:
- Table-level load-event warm-up and cluster-level load-event warm-up
are mutually exclusive for the same source and target compute group
pair.
- If a conflicting job already exists, creation returns an error that
includes the conflicting job id; table-level conflicts also include the
table filter.
- Duplicate checks within the same job type still follow the existing
duplicate-check logic.
- VCG-managed cluster-level load-event warm-up creation does not fail on
conflict. Because VCG jobs are created by the MS HTTP API path, FE
cancels existing table-level load-event warm-up jobs with the same
source and target first, then recreates the VCG-managed cluster-level
job.
- Manually creating a table-level load-event warm-up job is rejected
only when both source and target compute groups are owned by the same
VCG.
- SQL still cannot use a virtual compute group directly as the source or
target compute group.
Warm-up progress observation:
- BE records per-job windowed requested, finished, and failed warm-up
statistics.
- BE exposes per-job warm-up statistics through
`/api/warmup_event_driven_stats`.
- FE aggregates BE statistics and caches the aggregated result in the
warm-up job.
- SyncStats includes source-side and target-side warm-up size/count
progress across windows.
- SyncStats includes trigger-time progress, so users can observe whether
the target compute group is behind the latest source-side warm-up
trigger.
- FE `/metrics` exposes per-job active warm-up metadata, synchronized
size, and trigger gap metrics for cloud event-driven warm-up jobs.
Support table-level event-driven cloud warm-up with `ON TABLES` filters
and per-job warm-up sync statistics.
- Test
- [x] Regression test
- [x] Unit Test
- [x] Manual test
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason
- Behavior changed:
- [ ] No.
- [x] Yes. `WARM UP` supports table-level `ON TABLES` filters for
event-driven load warm-up, and warm-up job output/metrics expose table
filter, matched tables, SyncStats, and trigger-gap information.
- Does this need documentation?
- [ ] No.
- [x] Yes. apache/doris-website#3829
bobhan1
added a commit
to bobhan1/doris
that referenced
this pull request
Jun 17, 2026
Issue Number: None
Problem Summary:
This PR adds table-level event-driven cloud warm-up support and improves
active incremental warm-up progress observability.
Before this change, event-driven warm-up was only controlled at
compute-group granularity. Once a load-event warm-up job was enabled for
a source and target compute group pair, all source-side table writes
could trigger warm-up to the target compute group. That is inefficient
for workloads where only selected core tables, high-frequency query
tables, or selected async materialized views need to stay warm.
This PR lets users define the warm-up scope with `ON TABLES` when
creating an event-driven load warm-up job. FE persists the normalized
table filter in the warm-up job, resolves matched table ids dynamically,
sends the table ids to BE, and lets BE filter warm-up rowsets by table
id.
User-visible behavior:
- `WARM UP ... ON TABLES` supports table-level event-driven warm-up.
- Table filters support `INCLUDE` and `EXCLUDE` rules.
- Rules support `*` and `?` wildcards, for example `db.table`, `db.*`,
`*.orders_*`, and `log_db.log_?`.
- `INCLUDE` defines the candidate warm-up scope, and `EXCLUDE` removes
tables from that included scope.
- Rules are canonicalized before duplicate checks, so semantically
equivalent filters do not create duplicate jobs just because rule order
differs.
- Matching covers both regular OLAP tables and async materialized views.
- Matched table ids are refreshed as tables or async materialized views
are created, dropped, or renamed.
- The same source compute group can create independent table-level
warm-up jobs to different target compute groups with different table
filters.
- `SHOW WARM UP JOB` exposes the table-level job type, table filter,
matched tables, and SyncStats.
- `SHOW WARM UP JOB` list output keeps compact SyncStats, while
single-job lookup keeps detailed windowed SyncStats.
Example:
```sql
WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
INCLUDE 'core_db.config',
INCLUDE 'report_db.monthly_*',
INCLUDE '*.sales_*',
EXCLUDE '*.*_archive'
)
PROPERTIES (
"sync_mode" = "event_driven",
"sync_event" = "load"
);
```
Conflict and virtual compute group behavior:
- Table-level load-event warm-up and cluster-level load-event warm-up
are mutually exclusive for the same source and target compute group
pair.
- If a conflicting job already exists, creation returns an error that
includes the conflicting job id; table-level conflicts also include the
table filter.
- Duplicate checks within the same job type still follow the existing
duplicate-check logic.
- VCG-managed cluster-level load-event warm-up creation does not fail on
conflict. Because VCG jobs are created by the MS HTTP API path, FE
cancels existing table-level load-event warm-up jobs with the same
source and target first, then recreates the VCG-managed cluster-level
job.
- Manually creating a table-level load-event warm-up job is rejected
only when both source and target compute groups are owned by the same
VCG.
- SQL still cannot use a virtual compute group directly as the source or
target compute group.
Warm-up progress observation:
- BE records per-job windowed requested, finished, and failed warm-up
statistics.
- BE exposes per-job warm-up statistics through
`/api/warmup_event_driven_stats`.
- FE aggregates BE statistics and caches the aggregated result in the
warm-up job.
- SyncStats includes source-side and target-side warm-up size/count
progress across windows.
- SyncStats includes trigger-time progress, so users can observe whether
the target compute group is behind the latest source-side warm-up
trigger.
- FE `/metrics` exposes per-job active warm-up metadata, synchronized
size, and trigger gap metrics for cloud event-driven warm-up jobs.
Support table-level event-driven cloud warm-up with `ON TABLES` filters
and per-job warm-up sync statistics.
- Test
- [x] Regression test
- [x] Unit Test
- [x] Manual test
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason
- Behavior changed:
- [ ] No.
- [x] Yes. `WARM UP` supports table-level `ON TABLES` filters for
event-driven load warm-up, and warm-up job output/metrics expose table
filter, matched tables, SyncStats, and trigger-gap information.
- Does this need documentation?
- [ ] No.
- [x] Yes. apache/doris-website#3829
bobhan1
added a commit
to bobhan1/doris
that referenced
this pull request
Jun 17, 2026
Issue Number: None
Problem Summary:
This PR adds table-level event-driven cloud warm-up support and improves
active incremental warm-up progress observability.
Before this change, event-driven warm-up was only controlled at
compute-group granularity. Once a load-event warm-up job was enabled for
a source and target compute group pair, all source-side table writes
could trigger warm-up to the target compute group. That is inefficient
for workloads where only selected core tables, high-frequency query
tables, or selected async materialized views need to stay warm.
This PR lets users define the warm-up scope with `ON TABLES` when
creating an event-driven load warm-up job. FE persists the normalized
table filter in the warm-up job, resolves matched table ids dynamically,
sends the table ids to BE, and lets BE filter warm-up rowsets by table
id.
User-visible behavior:
- `WARM UP ... ON TABLES` supports table-level event-driven warm-up.
- Table filters support `INCLUDE` and `EXCLUDE` rules.
- Rules support `*` and `?` wildcards, for example `db.table`, `db.*`,
`*.orders_*`, and `log_db.log_?`.
- `INCLUDE` defines the candidate warm-up scope, and `EXCLUDE` removes
tables from that included scope.
- Rules are canonicalized before duplicate checks, so semantically
equivalent filters do not create duplicate jobs just because rule order
differs.
- Matching covers both regular OLAP tables and async materialized views.
- Matched table ids are refreshed as tables or async materialized views
are created, dropped, or renamed.
- The same source compute group can create independent table-level
warm-up jobs to different target compute groups with different table
filters.
- `SHOW WARM UP JOB` exposes the table-level job type, table filter,
matched tables, and SyncStats.
- `SHOW WARM UP JOB` list output keeps compact SyncStats, while
single-job lookup keeps detailed windowed SyncStats.
Example:
```sql
WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
INCLUDE 'core_db.config',
INCLUDE 'report_db.monthly_*',
INCLUDE '*.sales_*',
EXCLUDE '*.*_archive'
)
PROPERTIES (
"sync_mode" = "event_driven",
"sync_event" = "load"
);
```
Conflict and virtual compute group behavior:
- Table-level load-event warm-up and cluster-level load-event warm-up
are mutually exclusive for the same source and target compute group
pair.
- If a conflicting job already exists, creation returns an error that
includes the conflicting job id; table-level conflicts also include the
table filter.
- Duplicate checks within the same job type still follow the existing
duplicate-check logic.
- VCG-managed cluster-level load-event warm-up creation does not fail on
conflict. Because VCG jobs are created by the MS HTTP API path, FE
cancels existing table-level load-event warm-up jobs with the same
source and target first, then recreates the VCG-managed cluster-level
job.
- Manually creating a table-level load-event warm-up job is rejected
only when both source and target compute groups are owned by the same
VCG.
- SQL still cannot use a virtual compute group directly as the source or
target compute group.
Warm-up progress observation:
- BE records per-job windowed requested, finished, and failed warm-up
statistics.
- BE exposes per-job warm-up statistics through
`/api/warmup_event_driven_stats`.
- FE aggregates BE statistics and caches the aggregated result in the
warm-up job.
- SyncStats includes source-side and target-side warm-up size/count
progress across windows.
- SyncStats includes trigger-time progress, so users can observe whether
the target compute group is behind the latest source-side warm-up
trigger.
- FE `/metrics` exposes per-job active warm-up metadata, synchronized
size, and trigger gap metrics for cloud event-driven warm-up jobs.
Support table-level event-driven cloud warm-up with `ON TABLES` filters
and per-job warm-up sync statistics.
- Test
- [x] Regression test
- [x] Unit Test
- [x] Manual test
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason
- Behavior changed:
- [ ] No.
- [x] Yes. `WARM UP` supports table-level `ON TABLES` filters for
event-driven load warm-up, and warm-up job output/metrics expose table
filter, matched tables, SyncStats, and trigger-gap information.
- Does this need documentation?
- [ ] No.
- [x] Yes. apache/doris-website#3829
bobhan1
added a commit
to bobhan1/doris
that referenced
this pull request
Jun 18, 2026
Issue Number: None
Problem Summary:
This PR adds table-level event-driven cloud warm-up support and improves
active incremental warm-up progress observability.
Before this change, event-driven warm-up was only controlled at
compute-group granularity. Once a load-event warm-up job was enabled for
a source and target compute group pair, all source-side table writes
could trigger warm-up to the target compute group. That is inefficient
for workloads where only selected core tables, high-frequency query
tables, or selected async materialized views need to stay warm.
This PR lets users define the warm-up scope with `ON TABLES` when
creating an event-driven load warm-up job. FE persists the normalized
table filter in the warm-up job, resolves matched table ids dynamically,
sends the table ids to BE, and lets BE filter warm-up rowsets by table
id.
User-visible behavior:
- `WARM UP ... ON TABLES` supports table-level event-driven warm-up.
- Table filters support `INCLUDE` and `EXCLUDE` rules.
- Rules support `*` and `?` wildcards, for example `db.table`, `db.*`,
`*.orders_*`, and `log_db.log_?`.
- `INCLUDE` defines the candidate warm-up scope, and `EXCLUDE` removes
tables from that included scope.
- Rules are canonicalized before duplicate checks, so semantically
equivalent filters do not create duplicate jobs just because rule order
differs.
- Matching covers both regular OLAP tables and async materialized views.
- Matched table ids are refreshed as tables or async materialized views
are created, dropped, or renamed.
- The same source compute group can create independent table-level
warm-up jobs to different target compute groups with different table
filters.
- `SHOW WARM UP JOB` exposes the table-level job type, table filter,
matched tables, and SyncStats.
- `SHOW WARM UP JOB` list output keeps compact SyncStats, while
single-job lookup keeps detailed windowed SyncStats.
Example:
```sql
WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
INCLUDE 'core_db.config',
INCLUDE 'report_db.monthly_*',
INCLUDE '*.sales_*',
EXCLUDE '*.*_archive'
)
PROPERTIES (
"sync_mode" = "event_driven",
"sync_event" = "load"
);
```
Conflict and virtual compute group behavior:
- Table-level load-event warm-up and cluster-level load-event warm-up
are mutually exclusive for the same source and target compute group
pair.
- If a conflicting job already exists, creation returns an error that
includes the conflicting job id; table-level conflicts also include the
table filter.
- Duplicate checks within the same job type still follow the existing
duplicate-check logic.
- VCG-managed cluster-level load-event warm-up creation does not fail on
conflict. Because VCG jobs are created by the MS HTTP API path, FE
cancels existing table-level load-event warm-up jobs with the same
source and target first, then recreates the VCG-managed cluster-level
job.
- Manually creating a table-level load-event warm-up job is rejected
only when both source and target compute groups are owned by the same
VCG.
- SQL still cannot use a virtual compute group directly as the source or
target compute group.
Warm-up progress observation:
- BE records per-job windowed requested, finished, and failed warm-up
statistics.
- BE exposes per-job warm-up statistics through
`/api/warmup_event_driven_stats`.
- FE aggregates BE statistics and caches the aggregated result in the
warm-up job.
- SyncStats includes source-side and target-side warm-up size/count
progress across windows.
- SyncStats includes trigger-time progress, so users can observe whether
the target compute group is behind the latest source-side warm-up
trigger.
- FE `/metrics` exposes per-job active warm-up metadata, synchronized
size, and trigger gap metrics for cloud event-driven warm-up jobs.
Support table-level event-driven cloud warm-up with `ON TABLES` filters
and per-job warm-up sync statistics.
- Test
- [x] Regression test
- [x] Unit Test
- [x] Manual test
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason
- Behavior changed:
- [ ] No.
- [x] Yes. `WARM UP` supports table-level `ON TABLES` filters for
event-driven load warm-up, and warm-up job output/metrics expose table
filter, matched tables, SyncStats, and trigger-gap information.
- Does this need documentation?
- [ ] No.
- [x] Yes. apache/doris-website#3829
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
apache/doris#63832
ON TABLESsyntax,INCLUDE/EXCLUDEmatching rules, examples, refresh behavior,SHOW WARM UP JOBfields, detailedSyncStatsJSON, BE Bvar metrics, and FE Prometheus metrics.ON TABLESload-event warm-up should not be configured together for the same source and destination compute groups.Validation
git diff --checkNote: Docusaurus/docs-governance checks were not run because this checkout does not have
node_modules; the docs governance scripts fail on missinggray-matter.