Skip to content

Commit

Permalink
KUDU-2671: Update upstream docs
Browse files Browse the repository at this point in the history
This patch updates the upstream docs to include range specific
hash schemas within the partitioning section. An example
with the proper sql syntax is also included in the kudu impala
integration doc.

Change-Id: I8da554851a124d1d357be65d8bcc2c6c37875dcc
Reviewed-on: http://gerrit.cloudera.org:8080/21108
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <[email protected]>
  • Loading branch information
mreddy7 authored and alexeyserbin committed May 28, 2024
1 parent 6d6364d commit cf550d6
Show file tree
Hide file tree
Showing 2 changed files with 58 additions and 0 deletions.
42 changes: 42 additions & 0 deletions docs/kudu_impala_integration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -485,6 +485,48 @@ The example creates 16 partitions. You could also use `HASH (id, sku) PARTITIONS
However, a scan for `sku` values would almost always impact all 16 partitions, rather
than possibly being limited to 4.

.Range-Specific Hash Schemas
As of 1.17, Kudu supports range-specific hash schemas for tables. It's possible to
add ranges with a hash schema independent of the table-wide hash schema. This can be
done while creating or altering the table. The number of hash partition levels must
be the same across all ranges in a table.

[source, sql]
----
CREATE TABLE cust_behavior (
id BIGINT,
sku STRING,
salary STRING,
edu_level INT,
usergender STRING,
`group` STRING,
city STRING,
postcode STRING,
last_purchase_price FLOAT,
last_purchase_date BIGINT,
category STRING,
rating INT,
fulfilled_date BIGINT,
PRIMARY KEY (id, sku)
)
PARTITION BY HASH (id) PARTITIONS 4
RANGE (sku)
(
PARTITION VALUES < 'g'
PARTITION 'g' <= VALUES < 'o'
HASH (id) PARTITIONS 6
PARTITION 'o' <= VALUES < 'u'
HASH (id) PARTITIONS 8
PARTITION 'u' <= VALUES
)
STORED AS KUDU;
----

This example uses the range-specific hash schema feature for the middle two
ranges. The table-wide hash schema has 4 buckets while the hash schemas
for the middle two ranges have 6 and 8 buckets respectively. This can be done
in cases where we expect a higher workload in such ranges.

.Non-Covering Range Partitions
Kudu 1.0 and higher supports the use of non-covering range partitions,
which address scenarios like the following:
Expand Down
16 changes: 16 additions & 0 deletions docs/schema_design.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -435,6 +435,22 @@ NOTE: see the <<hash-range-partitioning-example>> and the
<<hash-hash-partitioning-example>> for further discussion of multilevel
partitioning.

[[flexible-partitioning]]
=== Flexible Partitioning

As of 1.17, Kudu supports range-specific hash schema for tables. It's now
possible to add ranges with their own unique hash schema independent of the
table-wide hash schema. This can be done while creating or altering the table.
This feature helps mitigate potential hotspotting as more buckets can be
added for a hash schema of a range that expects more workload.

[[same-number-of-hash-levels]]
[IMPORTANT]
.Same Number of Hash Levels
The number of hash partition levels must be the same across for all the ranges
in a table. See <<multilevel-partitioning>> for more details on hash partition
levels.

[[partition-pruning]]
=== Partition Pruning

Expand Down

0 comments on commit cf550d6

Please sign in to comment.