Decision Tree Value Error #34

orchardbirds · 2021-07-16T13:26:28Z

from the hackathon

from skorecard.bucketers import DecisionTreeBucketer

train = pd.read_csv("train.csv").drop("RiskPerformance", axis=1)

target = ['target']
features = [f for f in train.columns if f not in target]

X = train[features]
y = train[target]

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.25, random_state=411114)

dt_bucketer = DecisionTreeBucketer(variables=features)
dt_bucketer.fit(X_train, y_train)

outputs:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-0bb9739818a8> in <module>
      2 
      3 dt_bucketer = DecisionTreeBucketer(variables=features)
----> 4 dt_bucketer.fit(X_train, y_train)

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/bucketers/base_bucketer.py in fit(self, X, y)
    241         self.features_bucket_mapping_ = FeaturesBucketMapping(features_bucket_mapping_)
    242 
--> 243         self._generate_summary(X, y)
    244 
    245         return self

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/reporting/report.py in _generate_summary(self, X, y)
    208         # Calculate information value
    209         if y is not None:
--> 210             iv_scores = iv(self.transform(X), y)
    211         else:
    212             iv_scores = {}

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/reporting/report.py in iv(X, y, epsilon, digits)
    357         IVs (dict): Keys are feature names, values are the IV values
    358     """  # noqa
--> 359     return {col: _IV_score(y, X[col], epsilon=epsilon, digits=digits) for col in X.columns}

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/reporting/report.py in <dictcomp>(.0)
    357         IVs (dict): Keys are feature names, values are the IV values
    358     """  # noqa
--> 359     return {col: _IV_score(y, X[col], epsilon=epsilon, digits=digits) for col in X.columns}

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/metrics/metrics.py in _IV_score(y_test, y_pred, epsilon, digits)
     66 
     67     """
---> 68     df = woe_1d(y_pred, y_test, epsilon=epsilon)
     69 
     70     iv = ((df["non_target"] - df["target"]) * df["woe"]).sum()

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/metrics/metrics.py in woe_1d(X, y, epsilon)
     23     if not isinstance(y, pd.Series):
     24         if y.shape[0] == X.shape[0]:
---> 25             y = pd.Series(y, index=X.index)
     26         else:
     27             raise ValueError(f"y has {y.shape[0]}, but expected {X.shape[0]}")

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    353             name = ibase.maybe_extract_name(name, data, type(self))
    354 
--> 355             if is_empty_data(data) and dtype is None:
    356                 # gh-17261
    357                 warnings.warn(

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/pandas/core/construction.py in is_empty_data(data)
    792     is_none = data is None
    793     is_list_like_without_dtype = is_list_like(data) and not hasattr(data, "dtype")
--> 794     is_simple_empty = is_list_like_without_dtype and not data
    795     return is_none or is_simple_empty
    796 

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
   1533     def __nonzero__(self):
   1534         raise ValueError(
-> 1535             f"The truth value of a {type(self).__name__} is ambiguous. "
   1536             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1537         )

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The text was updated successfully, but these errors were encountered:

sbjelogr · 2021-07-16T13:43:09Z

Hah, y is passed as a dataframe, while it should be a series or a numpy array.
This lines

if not isinstance(y, pd.Series):
     24         if y.shape[0] == X.shape[0]:
---> 25             y = pd.Series(y, index=X.index)

Actually expects y as numpy array.

We should add an input validation step when calling fit

orchardbirds · 2021-07-16T14:01:37Z

Yep, you're right.

y.values.reshape(-1, )

worked. Should definitely have a check for this

orchardbirds · 2021-07-27T11:32:23Z

closed with #37

orchardbirds added the bug Something isn't working label Jul 16, 2021

orchardbirds mentioned this issue Jul 16, 2021

OptBucketer Index error #33

Closed

orchardbirds mentioned this issue Jul 26, 2021

add check for y type and shape #37

Merged

orchardbirds closed this as completed Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decision Tree Value Error #34

Decision Tree Value Error #34

orchardbirds commented Jul 16, 2021

sbjelogr commented Jul 16, 2021

orchardbirds commented Jul 16, 2021

orchardbirds commented Jul 27, 2021

Decision Tree Value Error #34

Decision Tree Value Error #34

Comments

orchardbirds commented Jul 16, 2021

sbjelogr commented Jul 16, 2021

orchardbirds commented Jul 16, 2021

orchardbirds commented Jul 27, 2021