Skip to content

Commit

Permalink
Optimising Series.nunique for Nan values pandas-dev#40865 (pandas-dev…
Browse files Browse the repository at this point in the history
  • Loading branch information
KenilMehta authored May 3, 2021
1 parent 6b57a69 commit c61e66e
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 3 deletions.
8 changes: 8 additions & 0 deletions asv_bench/benchmarks/frame_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -563,6 +563,14 @@ def time_frame_nunique(self):
self.df.nunique()


class SeriesNuniqueWithNan:
def setup(self):
self.ser = Series(100000 * (100 * [np.nan] + list(range(100)))).astype(float)

def time_series_nunique_nan(self):
self.ser.nunique()


class Duplicated:
def setup(self):
n = 1 << 20
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -665,7 +665,7 @@ Performance improvements
- Performance improvement in the conversion of pyarrow boolean array to a pandas nullable boolean array (:issue:`41051`)
- Performance improvement for concatenation of data with type :class:`CategoricalDtype` (:issue:`40193`)
- Performance improvement in :meth:`.GroupBy.cummin` and :meth:`.GroupBy.cummax` with nullable data types (:issue:`37493`)
-
- Performance improvement in :meth:`Series.nunique` with nan values (:issue:`40865`)

.. ---------------------------------------------------------------------------
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1041,8 +1041,10 @@ def nunique(self, dropna: bool = True) -> int:
>>> s.nunique()
4
"""
obj = remove_na_arraylike(self) if dropna else self
return len(obj.unique())
uniqs = self.unique()
if dropna:
uniqs = remove_na_arraylike(uniqs)
return len(uniqs)

@property
def is_unique(self) -> bool:
Expand Down

0 comments on commit c61e66e

Please sign in to comment.