|
27 | 27 |
|
28 | 28 | # Spark Queries
|
29 | 29 |
|
30 |
| -To use Iceberg in Spark, first configure [Spark catalogs](../spark-configuration). |
31 |
| - |
32 |
| -Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support in Spark versions: |
33 |
| - |
34 |
| -| Feature support | Spark 3 | Spark 2.4 | Notes | |
35 |
| -|--------------------------------------------------|-----------|------------|------------------------------------------------| |
36 |
| -| [`SELECT`](#querying-with-sql) | ✔️ | | | |
37 |
| -| [DataFrame reads](#querying-with-dataframes) | ✔️ | ✔️ | | |
38 |
| -| [Metadata table `SELECT`](#inspecting-tables) | ✔️ | | | |
39 |
| -| [History metadata table](#history) | ✔️ | ✔️ | | |
40 |
| -| [Snapshots metadata table](#snapshots) | ✔️ | ✔️ | | |
41 |
| -| [Files metadata table](#files) | ✔️ | ✔️ | | |
42 |
| -| [Manifests metadata table](#manifests) | ✔️ | ✔️ | | |
43 |
| -| [Partitions metadata table](#partitions) | ✔️ | ✔️ | | |
44 |
| -| [All metadata tables](#all-metadata-tables) | ✔️ | ✔️ | | |
45 |
| - |
| 30 | +To use Iceberg in Spark, first configure [Spark catalogs](../spark-configuration). Iceberg uses Apache Spark's DataSourceV2 API for data source and catalog implementations. |
46 | 31 |
|
47 | 32 | ## Querying with SQL
|
48 | 33 |
|
@@ -75,8 +60,6 @@ val df = spark.table("prod.db.table")
|
75 | 60 |
|
76 | 61 | ### Catalogs with DataFrameReader
|
77 | 62 |
|
78 |
| -Iceberg 0.11.0 adds multi-catalog support to `DataFrameReader` in both Spark 3 and 2.4. |
79 |
| - |
80 | 63 | Paths and table names can be loaded with Spark's `DataFrameReader` interface. How tables are loaded depends on how
|
81 | 64 | the identifier is specified. When using `spark.read.format("iceberg").load(table)` or `spark.table(table)` the `table`
|
82 | 65 | variable can take a number of forms as listed below:
|
@@ -205,38 +188,13 @@ Incremental read works with both V1 and V2 format-version.
|
205 | 188 | Incremental read is not supported by Spark's SQL syntax.
|
206 | 189 | {{< /hint >}}
|
207 | 190 |
|
208 |
| -### Spark 2.4 |
209 |
| - |
210 |
| -Spark 2.4 requires using the DataFrame reader with `iceberg` as a format, because 2.4 does not support direct SQL queries: |
211 |
| - |
212 |
| -```scala |
213 |
| -// named metastore table |
214 |
| -spark.read.format("iceberg").load("catalog.db.table") |
215 |
| -// Hadoop path table |
216 |
| -spark.read.format("iceberg").load("hdfs://nn:8020/path/to/table") |
217 |
| -``` |
218 |
| - |
219 |
| -#### Spark 2.4 with SQL |
220 |
| - |
221 |
| -To run SQL `SELECT` statements on Iceberg tables in 2.4, register the DataFrame as a temporary table: |
222 |
| - |
223 |
| -```scala |
224 |
| -val df = spark.read.format("iceberg").load("db.table") |
225 |
| -df.createOrReplaceTempView("table") |
226 |
| - |
227 |
| -spark.sql("""select count(1) from table""").show() |
228 |
| -``` |
229 |
| - |
230 |
| - |
231 | 191 | ## Inspecting tables
|
232 | 192 |
|
233 | 193 | To inspect a table's history, snapshots, and other metadata, Iceberg supports metadata tables.
|
234 | 194 |
|
235 | 195 | Metadata tables are identified by adding the metadata table name after the original table name. For example, history for `db.table` is read using `db.table.history`.
|
236 | 196 |
|
237 | 197 | {{< hint info >}}
|
238 |
| -For Spark 2.4, use the `DataFrameReader` API to [inspect tables](#inspecting-with-dataframes). |
239 |
| - |
240 | 198 | For Spark 3, prior to 3.2, the Spark [session catalog](../spark-configuration#replacing-the-session-catalog) does not support table names with multipart identifiers such as `catalog.database.table.metadata`. As a workaround, configure an `org.apache.iceberg.spark.SparkCatalog`, or use the Spark `DataFrameReader` API.
|
241 | 199 | {{< /hint >}}
|
242 | 200 |
|
@@ -422,7 +380,7 @@ SELECT * FROM prod.db.table.refs;
|
422 | 380 |
|
423 | 381 | ### Inspecting with DataFrames
|
424 | 382 |
|
425 |
| -Metadata tables can be loaded in Spark 2.4 or Spark 3 using the DataFrameReader API: |
| 383 | +Metadata tables can be loaded using the DataFrameReader API: |
426 | 384 |
|
427 | 385 | ```scala
|
428 | 386 | // named metastore table
|
|
0 commit comments