forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-46934][SQL][FOLLOWUP] Read/write roundtrip for struct type wit…
…h special characters with HMS - a backward compatible approach ### What changes were proposed in this pull request? A backward-compatible approach for apache#45039 to make older versions of spark properly read struct-typed columns created by spark 4.x or later with special characters. Compared with apache#45039, only the datasource tables are supported now, as we have a special way to store hive incompatible schema to the table properties. This is a safe removal because we don't have any release to support that. ### Why are the changes needed? backward-compatibility improvement ### Does this PR introduce _any_ user-facing change? Users can store/read struct-typed columns with special characters. ### How was this patch tested? #### tests provided by SPARK-22431 ```scala DDLSuite.scala: test("SPARK-22431: table with nested type col with special char") DDLSuite.scala: test("SPARK-22431: view with nested type") HiveDDLSuite.scala: test("SPARK-22431: table with nested type") { HiveDDLSuite.scala: test("SPARK-22431: view with nested type") { HiveDDLSuite.scala: test("SPARK-22431: alter table tests with nested types") { ``` #### tests provided by the previous PR towards SPARK-46934 ```scala HiveMetastoreCatalogSuite.scala: test("SPARK-46934: HMS columns cannot handle quoted columns") HiveMetastoreCatalogSuite.scala: test("SPARK-46934: Handle special characters in struct types") { HiveMetastoreCatalogSuite.scala: test("SPARK-46934: Handle special characters in struct types with CTAS") { HiveMetastoreCatalogSuite.scala: test("SPARK-46934: Handle special characters in struct types with hive DDL") { HiveDDLSuite.scala: test("SPARK-46934: quote element name before parsing struct") { HiveDDLSuite.scala: test("SPARK-46934: alter table tests with nested types") { ``` #### manually backward compatibility test 1. create a tarball with the current revison 2. cd dist 3. using spark-sql to mock data ``` spark-sql (default)> CREATE TABLE t AS SELECT named_struct('a.b.b', array('a'), 'a b c', map(1, 'a')) AS `a.b`; ``` 4. copy metadata to 3.5.3 release ``` cp -r ~/spark/dist/metastore_db . ``` 5. Fix derby version restrictions ``` rm jars/derby-10.14.2.0.jar cp -r ~/spark/dist/jars/derby-10.16.1.1.jar ./jars ``` 6. read data ``` spark-sql (default)> select version(); 6.5.3 32232e9 Time taken: 0.103 seconds, Fetched 1 row(s) spark-sql (default)> select * from t; {"a.b.b":["a"],"a b c":{1:"a"}} Time taken: 0.09 seconds, Fetched 1 row(s) spark-sql (default)> desc formatted t; a.b struct<a.b.b:array<string>,a b c:map<int,string>> # Detailed Table Information Catalog spark_catalog Database default Table t Owner hzyaoqin Created Time Wed Nov 27 17:40:53 CST 2024 Last Access UNKNOWN Created By Spark 4.0.0-SNAPSHOT Type MANAGED Provider parquet Statistics 1245 bytes Location file:/Users/hzyaoqin/spark/dist/spark-warehouse/t Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat org.apache.hadoop.mapred.SequenceFileInputFormat OutputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Time taken: 0.054 seconds, Fetched 17 row(s) ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48986 from yaooqinn/SPARK-46934-F. Authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]>
- Loading branch information
Showing
4 changed files
with
128 additions
and
52 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters