-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set write compression codec in Iceberg #24851
base: master
Are you sure you want to change the base?
Set write compression codec in Iceberg #24851
Conversation
bab1d84
to
b743d37
Compare
propertiesBuilder.put(COMMIT_NUM_RETRIES, Integer.toString(IcebergTableProperties.getMaxCommitRetry(tableMetadata.getProperties()))); | ||
|
||
switch (fileFormat) { | ||
case PARQUET -> getCompressionCodec(session).getParquetCompressionCodec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ebyhr, do you think we should introduce 3 matching properties for write.[parquet|avro|orc].compression-codec
in IcebergTableProperties
instead? This seems to be a more idiomatic way of doing this now that I looked here again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent some time on this with @pajaks today. A better-informed remark on this:
- since we already have
compression-codec
as a session property, it would make it difficult to support it inIcebergTableProperties
as a full-fledged table property (meaning: adding it toALTER TABLE table_name SET PROPERTIES compression-codec = 'xxx';
- no other Iceberg table property has a corresponding session property and it is for a reason, although I don't think I grasp it in its entirety just yet
b743d37
to
a64427d
Compare
e9d8deb
to
e7b1327
Compare
Looks like spark is having trouble inserting rows into a table for 'AVRO - ZSTD' pair when
This is the ghcr.io/trinodb/testing/spark3-iceberg:108, image history: ➜ git docker history ghcr.io/trinodb/testing/spark3-iceberg:108
IMAGE CREATED CREATED BY SIZE COMMENT
<missing> 6 weeks ago RUN |6 SPARK_VERSION=3.4.2 HADOOP_VERSION=3 … 430MB buildkit.dockerfile.v0
<missing> 6 weeks ago ARG SPARK_ARTIFACT=spark-3.4.2-bin-hadoop3 0B buildkit.dockerfile.v0
<missing> 6 weeks ago ARG POSTGRESQL_JAR_VERSION=42.3.5 0B buildkit.dockerfile.v0
<missing> 6 weeks ago ARG ICEBERG_JAR_VERSION=3.4_2.12 0B buildkit.dockerfile.v0
<missing> 6 weeks ago ARG ICEBERG_VERSION=1.7.0 0B buildkit.dockerfile.v0
<missing> 6 weeks ago ARG HADOOP_VERSION=3 0B buildkit.dockerfile.v0
<missing> 6 weeks ago ARG SPARK_VERSION=3.4.2 0B buildkit.dockerfile.v0
(...) Codec name should be
This goes to show that spark does read |
(VARCHAR 'write.parquet.compression-codec', VARCHAR 'zstd')"""); | ||
} | ||
String tableName = "test_table_codec" + randomNameSuffix(); | ||
Session snappySession = Session.builder(getSession()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do ALTER TABLE .... SET PROPERTIES...
- maybe it is relevant for your test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - adding a test case
Write codec values via Iceberg spec:
What is set after table is created with Trino, with this commit (looks good now, apart from excess
|
6b83fd8
to
76e3ce4
Compare
Fixed CI and rebased PR - @ebyhr PTAL |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
Outdated
Show resolved
Hide resolved
7b11fd3
to
971fa18
Compare
971fa18
to
bf5da75
Compare
Description
Set currently used write compression codec in Iceberg table properties.
Compression codec values are set as spelled by the spec - details in #24851 (comment)
Additional context and related issues
https://github.com/apache/iceberg/pull/8593/files#diff-c540a31e66b157a8f080433c82a29a070096d0e08c6578a0099153f1229bdb7aR93-R96 is setting default when no property is provided, even if file type != parquet: apache/iceberg#9490
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: