Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sample weights when bucketing #29

Closed
operte opened this issue Jun 28, 2021 · 5 comments
Closed

Support sample weights when bucketing #29

operte opened this issue Jun 28, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@operte
Copy link

operte commented Jun 28, 2021

Dear skorecard team, are there bucketers that support weights?

E.g. we undersampled our negative target for modelling, and now we need to apply a weight in this class to get proper default rates in each bucket.

I looked at the DecisionTreeBucketer but it doesn’t seem to support that.

@operte operte added the enhancement New feature or request label Jun 28, 2021
@sbjelogr
Copy link
Contributor

@operte , no, currently no bucketing method support weights.

How exactly would you envision the weights to work in the skorecard context?

I can only imagine it working when fitting the supervised bucketers
(in the current version, only the decision tree bucketer), where you would try to weight differently the tree learning algorithm (which would lead to different bucket boundaries)

Or do you mean something different?

@operte
Copy link
Author

operte commented Jun 30, 2021

That is what I meant, yes. Another example: for the FixedFrequencyBucketer, you might also have weights on the classes, which also changes the counts on each bucket, and then the bucketer needs to adjust that.

Tbh I'm not sure how to fix this in an elegant way, or if skorecard should have a way of doing this.

In my particular problem, what I ended up doing was oversample my datasets before feeding them in to skorecard.

I think @timvink and @orchardbirds discussed a way of implementing this.

@timvink
Copy link
Contributor

timvink commented Aug 11, 2021

Revisited this. We would need to do this consistently for all bucketers, which would be a lot of work. Instead, probably we could create 1 or 2 new bucketers that do what you want to do. The easiest solution is indeed to up/downsample but that quickly becomes quite large & slow. I also checked and even sklearn.preprocessing.KBinsDiscretizer doesn't support sample weight. Neither does feature-engine's CountFrequencyEncoder.

If you already know the weights on the classes however, wouldn't using UserInputBucketer be a better choice, where you manually assign the bucket values per class?

Do you already have a design in mind? EqualFrequencyBucketer but with sample weights? I can help with the implementation, but also open for a PR of course.

@timvink
Copy link
Contributor

timvink commented Aug 24, 2021

@operte thoughts?

@operte
Copy link
Author

operte commented Aug 25, 2021

No, I don't have different ideas for this. As I mentioned, in our case we went with the oversampling approach and that turned out fine.

Since you also can't find this option in other popular bucketers, this might not be a popular problem, so one option is to drop this. We might spend time adding the option of weights to a particular skorecard bucketer, but the user might want to use a different bucketer. Perhaps it's best to leave this issue to the user and just propose the oversampling approach in the documentation.

I did not understand the suggestion of using UserInputBucketer. How do you get to the buckets from the class weights?

@timvink timvink closed this as completed Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants