Improve the retry support for nondeterministic expressions #11789

firestarman · 2024-11-27T09:25:16Z

Contributes to #11649

This PR is trying to address some requirements described in issue #11649, but not all of them.

It introduces two new classes named GpuExpressionRetryable and RetryStateTracker to initially set up a fundamental support for the items 2 and 3 as below, and adds in the relevant unit tests.

We need code changes so that if a retry happens on a non-deterministic expression that is outside of a checkpoint/restore, then we fail instead of retrying.

We also want a way to detect a non-deterministic expression being run outside of a checkpoint/restore retry block and throw an error from the plan so that when we can have tests validate that we have this covered.

And it also adds the integration tests for the function rand() being used in HashAggregate, Generate, Projection, ArrowEvalPython and Filter. This is for the item 4, and it still does not cover all the cases where a nondeterministic expression can be used, but we are closer than before.

Signed-off-by: Firestarman <[email protected]>

firestarman · 2024-11-27T11:45:04Z

build

Signed-off-by: Firestarman <[email protected]>

firestarman · 2024-11-28T07:34:57Z

build

Signed-off-by: Firestarman <[email protected]>

firestarman · 2024-11-29T03:44:50Z

build

firestarman · 2024-11-29T06:52:58Z

The failure in CI is a known issue, pls refer to #11790.

firestarman · 2024-11-30T02:11:26Z

build

revans2

Looks like a great first step. I mostly want to understand the 3.5.1 issue around generate and if we need to file a follow on issue.

I would also love to see some rand tests around

join (we can restrict the range of rand to make it actually match)
hash aggregate with some distinct operators so expand is called
(Do we need/want rand in the key for an aggregate?)
window operations

Then I think we will have covered most of the major cases when this could be called.

revans2 · 2024-12-05T15:41:49Z

integration_tests/src/main/python/rand_test.py

+
+# See https://github.com/apache/spark/commit/9c0b803ba124a6e70762aec1e5559b0d66529f4d
+@ignore_order(local=True)
+@pytest.mark.skipif(is_before_spark_351(),


Should there be a follow on issue for us to handle this properly?

Personally we don't need to do anything. It is a pure Spark bug.
Spark 3.5.0 will complain the exception as below when evaluating a rand() in Generate with the codegen disabled and fail this test.

E Caused by: java.lang.IllegalArgumentException: requirement failed: Nondeterministic expression org.apache.spark.sql.catalyst.expressions.Rand should be initialized before eval. E at scala.Predef$.require(Predef.scala:281) E at org.apache.spark.sql.catalyst.expressions.Nondeterministic.eval(Expression.scala:497) E at org.apache.spark.sql.catalyst.expressions.Nondeterministic.eval$(Expression.scala:495) E at org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:35) E at org.apache.spark.sql.catalyst.expressions.CreateArray.$anonfun$eval$1(complexTypeCreator.scala:95) E at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) ... E at org.apache.spark.sql.catalyst.expressions.CreateArray.eval(complexTypeCreator.scala:95) E at org.apache.spark.sql.catalyst.expressions.ExplodeBase.eval(generators.scala:375) E at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$8(GenerateExec.scala:108)

And Spark before 3.5.0 will throw the following exception and fail the test too.

E pyspark.errors.exceptions.captured.AnalysisException: nondeterministic expressions are only allowed in Project, Filter, Aggregate or Window, found: E explode(array(rand(42L))),col E in operator Generate explode(array(rand(42))), false, [col#2].; E Project [col#2] E +- Generate explode(array(rand(42))), false, [col#2] E +- LogicalRDD [a#0], false

While GPU supports Generate with rand for all the versions. So to make this test work we have to ignore it for Spark before 3.5.1.

firestarman · 2024-12-06T07:19:52Z

I would also love to see some rand tests around

join (we can restrict the range of rand to make it actually match)

hash aggregate with some distinct operators so expand is called

(Do we need/want rand in the key for an aggregate?)

window operations

Thx for review. Would it be ok to add them by a following PR ? And merging this PR will not close the issue #11649.
Here is some investigation on Window and Expand. #11649 (comment)

firestarman · 2024-12-17T02:19:21Z

@revans2 Could you help review this again? I'v addressed your comments. Thx in advance.

firestarman added 2 commits November 27, 2024 17:23

Improve the retry support for nondeterministic expressions

c5a0b8f

Signed-off-by: Firestarman <[email protected]>

fix a build error for scala3

aab415f

Signed-off-by: Firestarman <[email protected]>

More retry support for GpuRand

09413df

Signed-off-by: Firestarman <[email protected]>

firestarman added 2 commits November 29, 2024 11:16

Fix a test error

2646c68

Signed-off-by: Firestarman <[email protected]>

Fix another possible test error

82589e1

Signed-off-by: Firestarman <[email protected]>

firestarman requested review from revans2 and GaryShen2008 December 4, 2024 02:13

sameerz added the bug Something isn't working label Dec 5, 2024

revans2 reviewed Dec 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the retry support for nondeterministic expressions #11789

Improve the retry support for nondeterministic expressions #11789

firestarman commented Nov 27, 2024 •

edited

Loading

firestarman commented Nov 27, 2024

firestarman commented Nov 28, 2024

firestarman commented Nov 29, 2024

firestarman commented Nov 29, 2024 •

edited

Loading

firestarman commented Nov 30, 2024

revans2 left a comment

revans2 Dec 5, 2024

firestarman Dec 6, 2024

firestarman commented Dec 6, 2024 •

edited

Loading

firestarman commented Dec 17, 2024

Improve the retry support for nondeterministic expressions #11789

Are you sure you want to change the base?

Improve the retry support for nondeterministic expressions #11789

Conversation

firestarman commented Nov 27, 2024 • edited Loading

firestarman commented Nov 27, 2024

firestarman commented Nov 28, 2024

firestarman commented Nov 29, 2024

firestarman commented Nov 29, 2024 • edited Loading

firestarman commented Nov 30, 2024

revans2 left a comment

Choose a reason for hiding this comment

revans2 Dec 5, 2024

Choose a reason for hiding this comment

firestarman Dec 6, 2024

Choose a reason for hiding this comment

firestarman commented Dec 6, 2024 • edited Loading

firestarman commented Dec 17, 2024

firestarman commented Nov 27, 2024 •

edited

Loading

firestarman commented Nov 29, 2024 •

edited

Loading

firestarman commented Dec 6, 2024 •

edited

Loading