diff --git a/docs/kudu_impala_integration.adoc b/docs/kudu_impala_integration.adoc index 0def0477c2..de01c3d597 100755 --- a/docs/kudu_impala_integration.adoc +++ b/docs/kudu_impala_integration.adoc @@ -485,6 +485,48 @@ The example creates 16 partitions. You could also use `HASH (id, sku) PARTITIONS However, a scan for `sku` values would almost always impact all 16 partitions, rather than possibly being limited to 4. +.Range-Specific Hash Schemas +As of 1.17, Kudu supports range-specific hash schemas for tables. It's possible to +add ranges with a hash schema independent of the table-wide hash schema. This can be +done while creating or altering the table. The number of hash partition levels must +be the same across all ranges in a table. + +[source, sql] +---- +CREATE TABLE cust_behavior ( + id BIGINT, + sku STRING, + salary STRING, + edu_level INT, + usergender STRING, + `group` STRING, + city STRING, + postcode STRING, + last_purchase_price FLOAT, + last_purchase_date BIGINT, + category STRING, + rating INT, + fulfilled_date BIGINT, + PRIMARY KEY (id, sku) +) +PARTITION BY HASH (id) PARTITIONS 4 +RANGE (sku) +( + PARTITION VALUES < 'g' + PARTITION 'g' <= VALUES < 'o' + HASH (id) PARTITIONS 6 + PARTITION 'o' <= VALUES < 'u' + HASH (id) PARTITIONS 8 + PARTITION 'u' <= VALUES +) +STORED AS KUDU; +---- + +This example uses the range-specific hash schema feature for the middle two +ranges. The table-wide hash schema has 4 buckets while the hash schemas +for the middle two ranges have 6 and 8 buckets respectively. This can be done +in cases where we expect a higher workload in such ranges. + .Non-Covering Range Partitions Kudu 1.0 and higher supports the use of non-covering range partitions, which address scenarios like the following: diff --git a/docs/schema_design.adoc b/docs/schema_design.adoc index 95d4d251c4..906682b86d 100644 --- a/docs/schema_design.adoc +++ b/docs/schema_design.adoc @@ -435,6 +435,22 @@ NOTE: see the <> and the <> for further discussion of multilevel partitioning. +[[flexible-partitioning]] +=== Flexible Partitioning + +As of 1.17, Kudu supports range-specific hash schema for tables. It's now +possible to add ranges with their own unique hash schema independent of the +table-wide hash schema. This can be done while creating or altering the table. +This feature helps mitigate potential hotspotting as more buckets can be +added for a hash schema of a range that expects more workload. + +[[same-number-of-hash-levels]] +[IMPORTANT] +.Same Number of Hash Levels +The number of hash partition levels must be the same across for all the ranges +in a table. See <> for more details on hash partition +levels. + [[partition-pruning]] === Partition Pruning