[SPARK-50674][PYTHON] Fix check for ‘terminate’ method existence in U…

…DTF evaluation ### What changes were proposed in this pull request? Fix check for ‘terminate’ method existence in UDTF evaluation ### Why are the changes needed? To ensure that UDTFs without a terminate method can still be used with partitioning without causing an AttributeError. Previously, udtf with partitioning will raise an AttributeError if the terminate method is not defined, as shown below ```py >>> from pyspark.sql.functions import udtf >>> from pyspark.sql import Row >>> >>> udtf(returnType="a: int") ... class TestUDTF: ... def eval(self, row: Row): ... if row[0] > 5: ... yield row[0], ... >>> spark.udtf.register("test_udtf", TestUDTF) <pyspark.sql.udtf.UserDefinedTableFunction object at 0x10298a1d0> >>> spark.sql("SELECT * FROM test_udtf(TABLE (SELECT id FROM range(0, 8)) PARTITION BY id)").show() org.apache.spark.api.python.PythonException: Traceback (most recent call last): ... File "...pyspark/worker.py", line 1052, in eval if self._udtf.terminate is not None: AttributeError: 'TestUDTF' object has no attribute 'terminate' ``` However, the terminate method is not required in such cases. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49299 from xinrong-meng/udtf_terminate. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
sendpai · Dec 27, 2024 · 939129e · 939129e
1 parent 003be89
commit 939129e
Show file tree

Hide file tree

Showing 2 changed files with 13 additions and 2 deletions.
diff --git a/python/pyspark/sql/tests/test_udtf.py b/python/pyspark/sql/tests/test_udtf.py
@@ -2198,6 +2198,17 @@ def terminate(self):
                     ],
                 )
 
+    def test_udtf_with_table_argument_and_partition_by_no_terminate(self):
+        func = self.udtf_for_table_argument()  # a udtf with no terminate method defined
+        self.spark.udtf.register("test_udtf", func)
+
+        assertDataFrameEqual(
+            self.spark.sql(
+                "SELECT * FROM test_udtf(TABLE (SELECT id FROM range(0, 8)) PARTITION BY id)"
+            ),
+            [Row(a=6), Row(a=7)],
+        )
+
     def test_udtf_with_table_argument_and_partition_by_and_order_by(self):
         class TestUDTF:
             def __init__(self):

diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py
@@ -1049,7 +1049,7 @@ def eval(self, *args, **kwargs) -> Iterator:
                 list(args) + list(kwargs.values())
             )
             if changed_partitions:
-                if self._udtf.terminate is not None:
+                if hasattr(self._udtf, "terminate"):
                     result = self._udtf.terminate()
                     if result is not None:
                         for row in result:
@@ -1075,7 +1075,7 @@ def eval(self, *args, **kwargs) -> Iterator:
                     self._eval_raised_skip_rest_of_input_table = True
 
         def terminate(self) -> Iterator:
-            if self._udtf.terminate is not None:
+            if hasattr(self._udtf, "terminate"):
                 return self._udtf.terminate()
             return iter(())