Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-35273][SQL] CombineFilters support non-deterministic expressions
### What changes were proposed in this pull request? This pr makes `CombineFilters` support non-deterministic expressions. For example: ```sql spark.sql("CREATE TABLE t1(id INT, dt STRING) using parquet PARTITIONED BY (dt)") spark.sql("CREATE VIEW v1 AS SELECT * FROM t1 WHERE dt NOT IN ('2020-01-01', '2021-01-01')") spark.sql("SELECT * FROM v1 WHERE dt = '2021-05-01' AND rand() <= 0.01").explain() ``` Before this pr: ``` == Physical Plan == *(1) Filter (isnotnull(dt#1) AND ((dt#1 = 2021-05-01) AND (rand(-6723800298719475098) <= 0.01))) +- *(1) ColumnarToRow +- FileScan parquet default.t1[id#0,dt#1] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(0 paths)[], PartitionFilters: [NOT dt#1 IN (2020-01-01,2021-01-01)], PushedFilters: [], ReadSchema: struct<id:int> ``` After this pr: ``` == Physical Plan == *(1) Filter (rand(-2400509328955813273) <= 0.01) +- *(1) ColumnarToRow +- FileScan parquet default.t1[id#0,dt#1] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(0 paths)[], PartitionFilters: [isnotnull(dt#1), NOT dt#1 IN (2020-01-01,2021-01-01), (dt#1 = 2021-05-01)], PushedFilters: [], ReadSchema: struct<id:int> ``` ### Why are the changes needed? Improve query performance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes apache#32405 from wangyum/SPARK-35273. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
- Loading branch information