Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decision Tree Value Error #34

Closed
orchardbirds opened this issue Jul 16, 2021 · 3 comments
Closed

Decision Tree Value Error #34

orchardbirds opened this issue Jul 16, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@orchardbirds
Copy link
Contributor

from the hackathon

from skorecard.bucketers import DecisionTreeBucketer

train = pd.read_csv("train.csv").drop("RiskPerformance", axis=1)

target = ['target']
features = [f for f in train.columns if f not in target]

X = train[features]
y = train[target]

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.25, random_state=411114)

dt_bucketer = DecisionTreeBucketer(variables=features)
dt_bucketer.fit(X_train, y_train)

outputs:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-0bb9739818a8> in <module>
      2 
      3 dt_bucketer = DecisionTreeBucketer(variables=features)
----> 4 dt_bucketer.fit(X_train, y_train)

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/bucketers/base_bucketer.py in fit(self, X, y)
    241         self.features_bucket_mapping_ = FeaturesBucketMapping(features_bucket_mapping_)
    242 
--> 243         self._generate_summary(X, y)
    244 
    245         return self

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/reporting/report.py in _generate_summary(self, X, y)
    208         # Calculate information value
    209         if y is not None:
--> 210             iv_scores = iv(self.transform(X), y)
    211         else:
    212             iv_scores = {}

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/reporting/report.py in iv(X, y, epsilon, digits)
    357         IVs (dict): Keys are feature names, values are the IV values
    358     """  # noqa
--> 359     return {col: _IV_score(y, X[col], epsilon=epsilon, digits=digits) for col in X.columns}

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/reporting/report.py in <dictcomp>(.0)
    357         IVs (dict): Keys are feature names, values are the IV values
    358     """  # noqa
--> 359     return {col: _IV_score(y, X[col], epsilon=epsilon, digits=digits) for col in X.columns}

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/metrics/metrics.py in _IV_score(y_test, y_pred, epsilon, digits)
     66 
     67     """
---> 68     df = woe_1d(y_pred, y_test, epsilon=epsilon)
     69 
     70     iv = ((df["non_target"] - df["target"]) * df["woe"]).sum()

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/skorecard/metrics/metrics.py in woe_1d(X, y, epsilon)
     23     if not isinstance(y, pd.Series):
     24         if y.shape[0] == X.shape[0]:
---> 25             y = pd.Series(y, index=X.index)
     26         else:
     27             raise ValueError(f"y has {y.shape[0]}, but expected {X.shape[0]}")

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    353             name = ibase.maybe_extract_name(name, data, type(self))
    354 
--> 355             if is_empty_data(data) and dtype is None:
    356                 # gh-17261
    357                 warnings.warn(

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/pandas/core/construction.py in is_empty_data(data)
    792     is_none = data is None
    793     is_list_like_without_dtype = is_list_like(data) and not hasattr(data, "dtype")
--> 794     is_simple_empty = is_list_like_without_dtype and not data
    795     return is_none or is_simple_empty
    796 

~/miniconda3/envs/skorecard_py37/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
   1533     def __nonzero__(self):
   1534         raise ValueError(
-> 1535             f"The truth value of a {type(self).__name__} is ambiguous. "
   1536             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1537         )

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


@orchardbirds orchardbirds added the bug Something isn't working label Jul 16, 2021
@sbjelogr
Copy link
Contributor

Hah, y is passed as a dataframe, while it should be a series or a numpy array.
This lines

if not isinstance(y, pd.Series):
     24         if y.shape[0] == X.shape[0]:
---> 25             y = pd.Series(y, index=X.index)

Actually expects y as numpy array.

We should add an input validation step when calling fit

@orchardbirds
Copy link
Contributor Author

Yep, you're right.

y.values.reshape(-1, )

worked. Should definitely have a check for this

@orchardbirds
Copy link
Contributor Author

closed with #37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants