Skip to content

Commit

Permalink
[SPARK-38821][PYTHON] Skip nsmall/nlarge nan test under pandas 1.4.[0…
Browse files Browse the repository at this point in the history
…,1,2]

### What changes were proposed in this pull request?
Skip nsmall/nlarge nan test under pandas 1.4.[0,1,2].

Pandas get wrong results when ``np.nan`` in the sorting column  since pandas-dev/pandas@16d2f59 (v1.4.0)

I confirmed this issue are fixed by:
pandas-dev/pandas@2886388

### Why are the changes needed?
No

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes apache#36356 from Yikun/SPARK-38821.

Authored-by: Yikun Jiang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
Yikun authored and HyukjinKwon committed Apr 26, 2022
1 parent 028c472 commit ac5ec64
Showing 1 changed file with 12 additions and 6 deletions.
18 changes: 12 additions & 6 deletions python/pyspark/pandas/tests/test_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -1814,8 +1814,12 @@ def test_nlargest(self):
index=np.random.rand(7),
)
psdf = ps.from_pandas(pdf)
self.assert_eq(psdf.nlargest(5, columns="a"), pdf.nlargest(5, columns="a"))
self.assert_eq(psdf.nlargest(5, columns=["a", "b"]), pdf.nlargest(5, columns=["a", "b"]))
# see also: https://github.com/pandas-dev/pandas/issues/46589
if not (LooseVersion("1.4.0") <= LooseVersion(pd.__version__) <= LooseVersion("1.4.2")):
self.assert_eq(psdf.nlargest(5, columns="a"), pdf.nlargest(5, columns="a"))
self.assert_eq(
psdf.nlargest(5, columns=["a", "b"]), pdf.nlargest(5, columns=["a", "b"])
)
self.assert_eq(psdf.nlargest(5, columns=["c"]), pdf.nlargest(5, columns=["c"]))
self.assert_eq(
psdf.nlargest(5, columns=["c"], keep="first"),
Expand All @@ -1838,10 +1842,12 @@ def test_nsmallest(self):
index=np.random.rand(7),
)
psdf = ps.from_pandas(pdf)
self.assert_eq(psdf.nsmallest(n=5, columns="a"), pdf.nsmallest(5, columns="a"))
self.assert_eq(
psdf.nsmallest(n=5, columns=["a", "b"]), pdf.nsmallest(5, columns=["a", "b"])
)
# see also: https://github.com/pandas-dev/pandas/issues/46589
if not (LooseVersion("1.4.0") <= LooseVersion(pd.__version__) <= LooseVersion("1.4.2")):
self.assert_eq(psdf.nsmallest(n=5, columns="a"), pdf.nsmallest(5, columns="a"))
self.assert_eq(
psdf.nsmallest(n=5, columns=["a", "b"]), pdf.nsmallest(5, columns=["a", "b"])
)
self.assert_eq(psdf.nsmallest(n=5, columns=["c"]), pdf.nsmallest(5, columns=["c"]))
self.assert_eq(
psdf.nsmallest(n=5, columns=["c"], keep="first"),
Expand Down

0 comments on commit ac5ec64

Please sign in to comment.