Skip to content

Commit

Permalink
Docs: Update new catalog features (apache#7433)
Browse files Browse the repository at this point in the history
  • Loading branch information
dramaticlly authored May 3, 2023
1 parent c9bdae1 commit 5117b6b
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 9 deletions.
18 changes: 15 additions & 3 deletions docs/spark-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,14 @@ spark.sql.catalog.hive_prod.uri = thrift://metastore-host:port
# omit uri to use the same URI as Spark: hive.metastore.uris in hive-site.xml
```

Below is an example for a REST catalog named `rest_prod` that loads tables from REST URL `http://localhost:8080`:

```plain
spark.sql.catalog.rest_prod = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.rest_prod.type = rest
spark.sql.catalog.rest_prod.uri = http://localhost:8080
```

Iceberg also supports a directory-based catalog in HDFS that can be configured using `type=hadoop`:

```plain
Expand All @@ -66,12 +74,16 @@ Both catalogs are configured using properties nested under the catalog name. Com
| Property | Values | Description |
| -------------------------------------------------- | ----------------------------- | -------------------------------------------------------------------- |
| spark.sql.catalog._catalog-name_.type | `hive`, `hadoop` or `rest` | The underlying Iceberg catalog implementation, `HiveCatalog`, `HadoopCatalog`, `RESTCatalog` or left unset if using a custom catalog |
| spark.sql.catalog._catalog-name_.catalog-impl | | The underlying Iceberg catalog implementation.|
| spark.sql.catalog._catalog-name_.catalog-impl | | The custom Iceberg catalog implementation. If `type` is null, `catalog-impl` must not be null. |
| spark.sql.catalog._catalog-name_.io-impl | | The custom FileIO implementation. |
| spark.sql.catalog._catalog-name_.metrics-reporter-impl | | The custom MetricsReporter implementation. |
| spark.sql.catalog._catalog-name_.default-namespace | default | The default current namespace for the catalog |
| spark.sql.catalog._catalog-name_.uri | thrift://host:port | Metastore connect URI; default from `hive-site.xml` |
| spark.sql.catalog._catalog-name_.uri | thrift://host:port | Hive metastore URL for hive typed catalog, REST URL for REST typed catalog |
| spark.sql.catalog._catalog-name_.warehouse | hdfs://nn:8020/warehouse/path | Base path for the warehouse directory |
| spark.sql.catalog._catalog-name_.cache-enabled | `true` or `false` | Whether to enable catalog cache, default value is `true` |
| spark.sql.catalog._catalog-name_.cache.expiration-interval-ms | `30000` (30 seconds) | Duration after which cached catalog entries are expired; Only effective if `cache-enabled` is `true`. `-1` disables cache expiration and `0` disables caching entirely, irrespective of `cache-enabled`. Default is `30000` (30 seconds) | |
| spark.sql.catalog._catalog-name_.cache.expiration-interval-ms | `30000` (30 seconds) | Duration after which cached catalog entries are expired; Only effective if `cache-enabled` is `true`. `-1` disables cache expiration and `0` disables caching entirely, irrespective of `cache-enabled`. Default is `30000` (30 seconds) |
| spark.sql.catalog._catalog-name_.table-default._propertyKey_ | | Default Iceberg table property value for property key _propertyKey_, which will be set on tables created by this catalog if not overridden |
| spark.sql.catalog._catalog-name_.table-override._propertyKey_ | | Enforced Iceberg table property value for property key _propertyKey_, which cannot be overridden by user |

Additional properties can be found in common [catalog configuration](../configuration#catalog-properties).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,16 +89,23 @@
* <p>This supports the following catalog configuration options:
*
* <ul>
* <li><code>type</code> - catalog type, "hive" or "hadoop". To specify a non-hive or hadoop
* catalog, use the <code>catalog-impl</code> option.
* <li><code>uri</code> - the Hive Metastore URI (Hive catalog only)
* <li><code>type</code> - catalog type, "hive" or "hadoop" or "rest". To specify a non-hive or
* hadoop catalog, use the <code>catalog-impl</code> option.
* <li><code>uri</code> - the Hive Metastore URI for Hive catalog or REST URI for REST catalog
* <li><code>warehouse</code> - the warehouse path (Hadoop catalog only)
* <li><code>catalog-impl</code> - a custom {@link Catalog} implementation to use
* <li><code>io-impl</code> - a custom {@link org.apache.iceberg.io.FileIO} implementation to use
* <li><code>metrics-reporter-impl</code> - a custom {@link
* org.apache.iceberg.metrics.MetricsReporter} implementation to use
* <li><code>default-namespace</code> - a namespace to use as the default
* <li><code>cache-enabled</code> - whether to enable catalog cache
* <li><code>cache.expiration-interval-ms</code> - interval in millis before expiring tables from
* catalog cache. Refer to {@link CatalogProperties#CACHE_EXPIRATION_INTERVAL_MS} for further
* details and significant values.
* <li><code>table-default.$tablePropertyKey</code> - table property $tablePropertyKey default at
* catalog level
* <li><code>table-override.$tablePropertyKey</code> - table property $tablePropertyKey enforced
* at catalog level
* </ul>
*
* <p>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,16 +89,23 @@
* <p>This supports the following catalog configuration options:
*
* <ul>
* <li><code>type</code> - catalog type, "hive" or "hadoop". To specify a non-hive or hadoop
* catalog, use the <code>catalog-impl</code> option.
* <li><code>uri</code> - the Hive Metastore URI (Hive catalog only)
* <li><code>type</code> - catalog type, "hive" or "hadoop" or "rest". To specify a non-hive or
* hadoop catalog, use the <code>catalog-impl</code> option.
* <li><code>uri</code> - the Hive Metastore URI for Hive catalog or REST URI for REST catalog
* <li><code>warehouse</code> - the warehouse path (Hadoop catalog only)
* <li><code>catalog-impl</code> - a custom {@link Catalog} implementation to use
* <li><code>io-impl</code> - a custom {@link org.apache.iceberg.io.FileIO} implementation to use
* <li><code>metrics-reporter-impl</code> - a custom {@link
* org.apache.iceberg.metrics.MetricsReporter} implementation to use
* <li><code>default-namespace</code> - a namespace to use as the default
* <li><code>cache-enabled</code> - whether to enable catalog cache
* <li><code>cache.expiration-interval-ms</code> - interval in millis before expiring tables from
* catalog cache. Refer to {@link CatalogProperties#CACHE_EXPIRATION_INTERVAL_MS} for further
* details and significant values.
* <li><code>table-default.$tablePropertyKey</code> - table property $tablePropertyKey default at
* catalog level
* <li><code>table-override.$tablePropertyKey</code> - table property $tablePropertyKey enforced
* at catalog level
* </ul>
*
* <p>
Expand Down

0 comments on commit 5117b6b

Please sign in to comment.