Skip to content

Commit

Permalink
Add docs for non-covering range partitioning
Browse files Browse the repository at this point in the history
Change-Id: I3b0fd7500c5399db9dcad617ae67fea247307353
Reviewed-on: http://gerrit.cloudera.org:8080/3796
Reviewed-by: Dan Burkert <[email protected]>
Tested-by: Misty Stanley-Jones <[email protected]>
  • Loading branch information
Misty Stanley-Jones committed Aug 15, 2016
1 parent 0fb4409 commit 274dfb0
Show file tree
Hide file tree
Showing 2 changed files with 168 additions and 80 deletions.
55 changes: 55 additions & 0 deletions docs/kudu_impala_integration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -795,6 +795,61 @@ The example creates 16 buckets. You could also use `HASH (id, sku) INTO 16 BUCKE
However, a scan for `sku` values would almost always impact all 16 buckets, rather
than possibly being limited to 4.

// Not ready for 0.10 but don't want to lose the work
////
.Non-Covering Range Partitions
Kudu TODO:VERSION and higher supports the use of non-covering range partitions,
which address scenarios like the following:
- Without non-covering range partitions, in the case of time-series data or other
schemas which need to account for constantly-increasing primary keys, tablets
serving old data will be relatively fixed in size, while tablets receiving new
data will grow without bounds.
- In cases where you want to partition data based on its category, such as sales
region or product type, without non-covering range partitions you must know all
of the partitions ahead of time or manually recreate your table if partitions
need to be added or removed, such as the introduction or elimination of a product
type.
Non-covering range partitions have some caveats. Be sure to read the
link:/docs/schema_design.html [Schema Design guide].
This example creates a tablet per year (5 tablets total), for storing log data. It uses `RANGE BOUND`
to ensure that the table only accepts data from 2011 to 2017. Keys outside of these
ranges will be rejected.
[source,sql]
----
CREATE TABLE sales_by_year (year INT32, sale_id INT32, amount INT32)
PRIMARY KEY (sale_id, year)
DISTRIBUTE BY RANGE (year)
RANGE BOUND ((2011), (2016))
SPLIT ROWS ((2012), (2013), (2014), (2015), (2016));
----
When records start coming in for 2017, they will be rejected. At that point, the `2017`
range should be added. An `alter table add range partition` or `alter table drop range
partition` operation allows you to add or drop a range partition.
The next example creates a table per sales region. Data for regions other than `North
America`, `Europe`, or `Asia` will be rejected. This example does not use explicit
split rows, but the range bounds provide implicit split rows, so three tablets would
be created. If a new range is added, a new tablet is created.
[source,sql]
----
CREATE TABLE sales_by_region (region STRING, sale_id INT32, amount INT32)
PRIMARY KEY (sale_id, region)
DISTRIBUTE BY RANGE (region)
RANGE BOUND (("North America"), ("North America\0")),
RANGE BOUND (("Europe"), ("Europe\0")),
RANGE BOUND (("Asia"), ("Asia\0"));
----
////

[[partitioning_rules_of_thumb]]
==== Partitioning Rules of Thumb

Expand Down
193 changes: 113 additions & 80 deletions docs/schema_design.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,89 +41,21 @@ be a new concept for those familiar with traditional relational databases. The
next sections discuss <<alter-schema,altering the schema>> of an existing table,
and <<known-limitations,known limitations>> with regard to schema design.

[[column-design]]
== Column Design

A Kudu Table consists of one or more columns, each with a predefined type.
Columns that are not part of the primary key may optionally be nullable.
Supported column types include:

* boolean
* 8-bit signed integer
* 16-bit signed integer
* 32-bit signed integer
* 64-bit signed integer
* timestamp
* single-precision (32-bit) IEEE-754 floating-point number
* double-precision (64-bit) IEEE-754 floating-point number
* UTF-8 encoded string
* binary

Kudu takes advantage of strongly-typed columns and a columnar on-disk storage
format to provide efficient encoding and serialization. To make the most of these
features, columns must be specified as the appropriate type, rather than
simulating a 'schemaless' table using string or binary columns for data which
may otherwise be structured. In addition to encoding, Kudu optionally allows
compression to be specified on a per-column basis.

[[encoding]]
=== Column Encoding

Each column in a Kudu table can be created with an encoding, based on the type
of the column. Columns use plain encoding by default.

.Encoding Types
[options="header"]
|===
| Column Type | Encoding
| integer, timestamp | plain, bitshuffle, run length
| float | plain, bitshuffle
| bool | plain, dictionary, run length
| string, binary | plain, prefix, dictionary
|===

[[plain]]
Plain Encoding:: Data is stored in its natural format. For example, `int32` values
are stored as fixed-size 32-bit little-endian integers.

[[bitshuffle]]
Bitshuffle Encoding:: Data is rearranged to store the most significant bit of
every value, followed by the second most significant bit of every value, and so
on. Finally, the result is LZ4 compressed. Bitshuffle encoding is a good choice for
columns that have many repeated values, or values that change by small amounts
when sorted by primary key. The
https://github.com/kiyo-masui/bitshuffle[bitshuffle] project has a good
overview of performance and use cases.

[[run-length]]
Run Length Encoding:: _Runs_ (consecutive repeated values) are compressed in a
column by storing only the value and the count. Run length encoding is effective
for columns with many consecutive repeated values when sorted by primary key.

[[dictionary]]
Dictionary Encoding:: A dictionary of unique values is built, and each column value
is encoded as its corresponding index in the dictionary. Dictionary encoding
is effective for columns with low cardinality. If the column values of a given row set
are unable to be compressed because the number of unique values is too high, Kudu will
transparently fall back to plain encoding for that row set. This is evaluated during
flush.
== The Perfect Schema

[[prefix]]
Prefix Encoding:: Common prefixes are compressed in consecutive column values. Prefix
encoding can be effective for values that share common prefixes, or the first
column of the primary key, since rows are sorted by primary key within tablets.
The perfect schema would accomplish the following:

[[compression]]
=== Column Compression
- Data would be distributed in such a way that reads and writes are spread evenly
across tablet servers. This is impacted by the partition schema.
- Tablets would grow at an even, predictable rate and load across tablets would remain
steady over time. This is most impacted by the partition schema.
- Scans would read the minimum amount of data necessary to fulfill a query. This
is impacted mostly by primary key design, but partition design also plays a
role via partition pruning.

Kudu allows per-column compression using LZ4, `snappy`, or `zlib` compression
codecs. By default, columns are stored uncompressed. Consider using compression
if reducing storage space is more important than raw scan performance.

Every data set will compress differently, but in general LZ4 has the least effect on
performance, while `zlib` will compress to the smallest data sizes.
Bitshuffle-encoded columns are inherently compressed using LZ4, so it is not
typically beneficial to apply additional compression on top of this encoding.
The perfect schema depends on the characteristics of your data, what you need to do
with it, and the topology of your cluster. Schema design is the single most important
thing within your control to maximize the performance of your Kudu cluster.

[[primary-keys]]
== Primary Keys
Expand Down Expand Up @@ -209,6 +141,23 @@ should only include the `last_name` column. In that case, Kudu would guarantee t
customers with the same last name would fall into the same tablet, regardless of
the provided split rows.

[[range-partition-management]]
==== Range Partition Management

Kudu 0.10 introduces the ability to specify bounded range partitions during
table creation, and the ability add and drop range partitions on the fly. This is
a good strategy for data which is always increasing, such as timestamps, or for
categorical data, such as geographic regions.

For example, during table creation, bounded range partitions can be
added for the regions 'US-EAST', 'US-WEST', and 'EUROPE'. If you attempt to insert a
row with a region that does not match an existing range partition, the insertion will
fail. Later, when a new region is needed it can be efficiently added as part of an
`ALTER TABLE` operation. This feature is particularly useful for timeseries data,
since it allows new range partitions for the current period to be added as
needed, and old partitions covering historical periods to be dropped if
necessary.

[[hash-bucketing]]
=== Hash Bucketing

Expand Down Expand Up @@ -272,6 +221,90 @@ You can alter a table's schema in the following ways:

You cannot modify the partition schema after table creation.

[[column-design]]
== Column Design

A Kudu Table consists of one or more columns, each with a predefined type.
Columns that are not part of the primary key may optionally be nullable.
Supported column types include:

* boolean
* 8-bit signed integer
* 16-bit signed integer
* 32-bit signed integer
* 64-bit signed integer
* timestamp
* single-precision (32-bit) IEEE-754 floating-point number
* double-precision (64-bit) IEEE-754 floating-point number
* UTF-8 encoded string
* binary

Kudu takes advantage of strongly-typed columns and a columnar on-disk storage
format to provide efficient encoding and serialization. To make the most of these
features, columns must be specified as the appropriate type, rather than
simulating a 'schemaless' table using string or binary columns for data which
may otherwise be structured. In addition to encoding, Kudu optionally allows
compression to be specified on a per-column basis.

[[encoding]]
=== Column Encoding

Each column in a Kudu table can be created with an encoding, based on the type
of the column. Columns use plain encoding by default.

.Encoding Types
[options="header"]
|===
| Column Type | Encoding
| integer, timestamp | plain, bitshuffle, run length
| float | plain, bitshuffle
| bool | plain, dictionary, run length
| string, binary | plain, prefix, dictionary
|===

[[plain]]
Plain Encoding:: Data is stored in its natural format. For example, `int32` values
are stored as fixed-size 32-bit little-endian integers.

[[bitshuffle]]
Bitshuffle Encoding:: Data is rearranged to store the most significant bit of
every value, followed by the second most significant bit of every value, and so
on. Finally, the result is LZ4 compressed. Bitshuffle encoding is a good choice for
columns that have many repeated values, or values that change by small amounts
when sorted by primary key. The
https://github.com/kiyo-masui/bitshuffle[bitshuffle] project has a good
overview of performance and use cases.

[[run-length]]
Run Length Encoding:: _Runs_ (consecutive repeated values) are compressed in a
column by storing only the value and the count. Run length encoding is effective
for columns with many consecutive repeated values when sorted by primary key.

[[dictionary]]
Dictionary Encoding:: A dictionary of unique values is built, and each column value
is encoded as its corresponding index in the dictionary. Dictionary encoding
is effective for columns with low cardinality. If the column values of a given row set
are unable to be compressed because the number of unique values is too high, Kudu will
transparently fall back to plain encoding for that row set. This is evaluated during
flush.

[[prefix]]
Prefix Encoding:: Common prefixes are compressed in consecutive column values. Prefix
encoding can be effective for values that share common prefixes, or the first
column of the primary key, since rows are sorted by primary key within tablets.

[[compression]]
=== Column Compression

Kudu allows per-column compression using LZ4, `snappy`, or `zlib` compression
codecs. By default, columns are stored uncompressed. Consider using compression
if reducing storage space is more important than raw scan performance.

Every data set will compress differently, but in general LZ4 has the least effect on
performance, while `zlib` will compress to the smallest data sizes.
Bitshuffle-encoded columns are inherently compressed using LZ4, so it is not
typically beneficial to apply additional compression on top of this encoding.

[[known-limitations]]
== Known Limitations

Expand Down

0 comments on commit 274dfb0

Please sign in to comment.