Skip to content

Commit

Permalink
FIX OneHotEncoder.fit no longer alters the drop parameter (scikit-lea…
Browse files Browse the repository at this point in the history
  • Loading branch information
thomasjpfan authored Apr 20, 2021
1 parent 0bd7ced commit 004b44d
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 7 deletions.
3 changes: 3 additions & 0 deletions doc/whats_new/v0.24.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ Changelog
- |Fix| :meth:`preprocessing.OrdinalEncoder.transfrom` correctly handles
unknown values for string dtypes. :pr:`19888` by `Thomas Fan`_.

- |Fix| :meth:`preprocessing.OneHotEncoder.fit` no longer alters the `drop`
parameter. :pr:`19924` by `Thomas Fan`_.

:mod:`sklearn.multioutput`
..........................

Expand Down
2 changes: 1 addition & 1 deletion doc/whats_new/v1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ Changelog
- |Fix| Improved convergence detection based on center change in
:class:`cluster.MiniBatchKMeans` which was almost never achievable.
:pr:`17622` by :user:`Jérémie du Boisberranger <jeremiedbb>`.

- |FIX| :class:`cluster.AgglomerativeClustering` now supports readonly
memory-mapped datasets. :pr:`19883` by `Julien Jerphanion <jjerphan>`.

Expand Down
11 changes: 5 additions & 6 deletions sklearn/preprocessing/_encoders.py
Original file line number Diff line number Diff line change
Expand Up @@ -363,22 +363,21 @@ def _compute_drop_idx(self):

else:
try:
self.drop = np.asarray(self.drop, dtype=object)
droplen = len(self.drop)
drop_array = np.asarray(self.drop, dtype=object)
droplen = len(drop_array)
except (ValueError, TypeError):
msg = (
"Wrong input for parameter `drop`. Expected "
"'first', 'if_binary', None or array of objects, got {}"
)
raise ValueError(msg.format(type(self.drop)))
raise ValueError(msg.format(type(drop_array)))
if droplen != len(self.categories_):
msg = ("`drop` should have length equal to the number "
"of features ({}), got {}")
raise ValueError(msg.format(len(self.categories_),
len(self.drop)))
raise ValueError(msg.format(len(self.categories_), droplen))
missing_drops = []
drop_indices = []
for col_idx, (val, cat_list) in enumerate(zip(self.drop,
for col_idx, (val, cat_list) in enumerate(zip(drop_array,
self.categories_)):
if not is_scalar_nan(val):
drop_idx = np.where(cat_list == val)[0]
Expand Down
2 changes: 2 additions & 0 deletions sklearn/preprocessing/tests/test_encoders.py
Original file line number Diff line number Diff line change
Expand Up @@ -748,6 +748,8 @@ def test_one_hot_encoder_drop_manual(missing_value):
[0, 1, 0, 1, 1],
[0, 0, 0, 0, 0]]
assert_array_equal(trans, exp)
assert enc.drop is cats_to_drop

dropped_cats = [cat[feature]
for cat, feature in zip(enc.categories_,
enc.drop_idx_)]
Expand Down

0 comments on commit 004b44d

Please sign in to comment.