You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/manage/pages/iceberg/iceberg-performance-tuning.adoc
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@
12
12
include::shared:partial$enterprise-license.adoc[]
13
13
====
14
14
15
-
Use this guide to optimize the performance of Iceberg topics in Redpanda. It covers strategies for improving downstream query performance, tuning the Iceberg translation pipeline, and monitoring translation throughput.
15
+
This guide covers strategies for optimizing the performance of Iceberg topics in Redpanda, including improving downstream query performance, tuning the Iceberg translation pipeline, and monitoring translation throughput.
16
16
17
17
After reading this page, you will be able to:
18
18
@@ -22,7 +22,7 @@ After reading this page, you will be able to:
22
22
23
23
== Prerequisites
24
24
25
-
Before tuning Iceberg performance, you need to be familiar with how Iceberg topics work in Redpanda. See xref:manage:iceberg/about-iceberg-topics.adoc[About Iceberg Topics].
25
+
You must be familiar with how Iceberg topics work in Redpanda. See xref:manage:iceberg/about-iceberg-topics.adoc[About Iceberg Topics].
26
26
27
27
== Optimize query performance
28
28
@@ -32,7 +32,7 @@ Query engines read Parquet files from object storage to process Iceberg table da
32
32
33
33
To improve query performance, consider implementing custom https://iceberg.apache.org/docs/nightly/partitioning/[partitioning^] for the Iceberg topic. Use the xref:reference:properties/topic-properties.adoc#redpanda-iceberg-partition-spec[`redpanda.iceberg.partition.spec`] topic property to define the partitioning scheme:
34
34
35
-
[,bash,]
35
+
[,bash]
36
36
----
37
37
# Create new topic with five topic partitions, replication factor 3, and custom table partitioning for Iceberg
@@ -50,7 +50,7 @@ To learn more about how partitioning schemes can affect query performance, and f
50
50
51
51
[TIP]
52
52
====
53
-
* Partition by columns that you frequently use in queries. Columns with relatively few unique values, also known as low cardinality, are also good candidates for partitioning.
53
+
* Partition by columns that you frequently use in queries. Columns with relatively few unique values (low cardinality) are good candidates for partitioning.
54
54
* If you must partition based on columns with high cardinality, for example timestamps, use Iceberg's available transforms such as extracting the year, month, or day to avoid creating too many partitions. Too many partitions can be detrimental to performance because more files need to be scanned and managed.
* Replace `<bucket-name>` with your bucket name and `<gcp-project-id>` with your Google Cloud project ID.
249
-
* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue-dlq[dead-letter queue (DLQ) table].
249
+
* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue (DLQ) table].
250
250
--
251
251
+
252
252
NOTE: If you edit `bootstrap.yml`, you can skip the cluster configuration step in <<configure-redpanda-for-iceberg>> and proceed to the next step in that section to enable Iceberg for a topic.
* Replace `<bucket-name>` with your bucket name and `<gcp-project-id>` with your Google Cloud project ID.
296
-
* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue-dlq[dead-letter queue (DLQ) table].
296
+
* You must set the `iceberg_dlq_table_suffix` property to a value that does not include dots or tildes (`~`). The example above uses `_dlq` as the suffix for the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue (DLQ) table].
This page covers how to diagnose and resolve errors that occur during Iceberg translation, including working with dead-letter queue (DLQ) tables and handling invalid records.
18
+
{description}
15
19
16
-
== Dead-letter queue (DLQ)
20
+
Use this page to:
17
21
18
-
If Redpanda encounters an error while writing a record to the Iceberg table, Redpanda by default writes the record to a separate dead-letter queue (DLQ) Iceberg table named `<topic-name>~dlq`. The following can cause errors to occur when translating records in the `value_schema_id_prefix` and `value_schema_latest` modes to the Iceberg table format:
22
+
* [ ] {learning-objective-1}
23
+
* [ ] {learning-objective-2}
24
+
25
+
== Dead-letter queue
26
+
27
+
If Redpanda encounters an error while writing a record to the Iceberg table, Redpanda by default writes the record to a separate DLQ Iceberg table named `<topic-name>~dlq`. The following can cause errors to occur when translating records in the `value_schema_id_prefix` and `value_schema_latest` modes to the Iceberg table format:
19
28
20
29
- Redpanda cannot find the embedded schema ID in the Schema Registry.
21
30
- Redpanda fails to translate one or more schema data types to an Iceberg type.
@@ -62,7 +71,7 @@ The data is in binary format, and the first byte is not `0x00`, indicating that
62
71
63
72
=== Reprocess DLQ records
64
73
65
-
You can apply a transformation and reprocess the record in your data lakehouse to the original Iceberg table. In this case, you have a JSON value represented as a UTF-8 binary. Depending on your query engine, you might need to decode the binary value first before extracting the JSON fields. Some engines may automatically decode the binary value for you:
74
+
You can apply a transformation and reprocess the record in your data lakehouse to the original Iceberg table. In this case, you have a JSON value represented as a UTF-8 binary. Depending on your query engine, you might need to decode the binary value first before extracting the JSON fields. Some query engines decode the binary value automatically:
You can now insert the transformed record back into the main Iceberg table. Redpanda recommends employing a strategy for exactly-once processing to avoid duplicates when reprocessing records.
99
+
You can now insert the transformed record back into the main Iceberg table. Redpanda recommends using an exactly-once processing strategy to avoid duplicates when reprocessing records.
91
100
92
101
=== Drop invalid records
93
102
@@ -102,8 +111,8 @@ endif::[]
102
111
103
112
The following xref:reference:public-metrics-reference.adoc#iceberg-metrics[Iceberg metrics] help identify translation errors, invalid records, and catalog connectivity issues:
104
113
105
-
* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_dlq_files_created[`redpanda_iceberg_translation_dlq_files_created`]: Number of dead letter queue (DLQ) Parquet files created. A non-zero and increasing value indicates records are failing to translate.
106
-
* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_invalid_records[`redpanda_iceberg_translation_invalid_records`]: Number of invalid records encountered during translation, labeled by cause.
114
+
* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_dlq_files_created[`redpanda_iceberg_translation_dlq_files_created`]: Number of DLQ Parquet files created. A non-zero and increasing value indicates records are failing to translate. See <<inspect-dlq-table>> to examine the failed records.
115
+
* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_translation_invalid_records[`redpanda_iceberg_translation_invalid_records`]: Number of invalid records encountered during translation, labeled by cause. See <<drop-invalid-records>> to configure how Redpanda handles these records.
107
116
* xref:reference:public-metrics-reference.adoc#redpanda_iceberg_rest_client_num_commit_table_update_requests_failed[`redpanda_iceberg_rest_client_num_commit_table_update_requests_failed`]: Failed table commit requests to the REST catalog. Applies only when using a REST catalog (`iceberg_catalog_type: rest`). Persistent failures indicate catalog connectivity or permission issues.
Copy file name to clipboardExpand all lines: modules/manage/pages/iceberg/specify-iceberg-schema.adoc
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -60,7 +60,7 @@ The following modes are compatible with producing to an Iceberg topic using Redp
60
60
- `key_value`
61
61
- Starting in version 25.2, `value_schema_latest` with a JSON schema
62
62
63
-
Otherwise, records may fail to write to the Iceberg table and instead write to the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue-dlq[dead-letter queue].
63
+
Otherwise, records may fail to write to the Iceberg table and instead write to the xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue].
0 commit comments