Support sample weights when bucketing #29

operte · 2021-06-28T13:07:51Z

Dear skorecard team, are there bucketers that support weights?

E.g. we undersampled our negative target for modelling, and now we need to apply a weight in this class to get proper default rates in each bucket.

I looked at the DecisionTreeBucketer but it doesn’t seem to support that.

sbjelogr · 2021-06-29T14:13:14Z

@operte , no, currently no bucketing method support weights.

How exactly would you envision the weights to work in the skorecard context?

I can only imagine it working when fitting the supervised bucketers
(in the current version, only the decision tree bucketer), where you would try to weight differently the tree learning algorithm (which would lead to different bucket boundaries)

Or do you mean something different?

operte · 2021-06-30T09:21:44Z

That is what I meant, yes. Another example: for the FixedFrequencyBucketer, you might also have weights on the classes, which also changes the counts on each bucket, and then the bucketer needs to adjust that.

Tbh I'm not sure how to fix this in an elegant way, or if skorecard should have a way of doing this.

In my particular problem, what I ended up doing was oversample my datasets before feeding them in to skorecard.

I think @timvink and @orchardbirds discussed a way of implementing this.

timvink · 2021-08-11T08:15:06Z

Revisited this. We would need to do this consistently for all bucketers, which would be a lot of work. Instead, probably we could create 1 or 2 new bucketers that do what you want to do. The easiest solution is indeed to up/downsample but that quickly becomes quite large & slow. I also checked and even sklearn.preprocessing.KBinsDiscretizer doesn't support sample weight. Neither does feature-engine's CountFrequencyEncoder.

If you already know the weights on the classes however, wouldn't using UserInputBucketer be a better choice, where you manually assign the bucket values per class?

Do you already have a design in mind? EqualFrequencyBucketer but with sample weights? I can help with the implementation, but also open for a PR of course.

timvink · 2021-08-24T13:59:06Z

@operte thoughts?

operte · 2021-08-25T09:21:47Z

No, I don't have different ideas for this. As I mentioned, in our case we went with the oversampling approach and that turned out fine.

Since you also can't find this option in other popular bucketers, this might not be a popular problem, so one option is to drop this. We might spend time adding the option of weights to a particular skorecard bucketer, but the user might want to use a different bucketer. Perhaps it's best to leave this issue to the user and just propose the oversampling approach in the documentation.

I did not understand the suggestion of using UserInputBucketer. How do you get to the buckets from the class weights?

operte added the enhancement New feature or request label Jun 28, 2021

timvink closed this as completed Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support sample weights when bucketing #29

Support sample weights when bucketing #29

operte commented Jun 28, 2021

sbjelogr commented Jun 29, 2021

operte commented Jun 30, 2021

timvink commented Aug 11, 2021

timvink commented Aug 24, 2021

operte commented Aug 25, 2021

Support sample weights when bucketing #29

Support sample weights when bucketing #29

Comments

operte commented Jun 28, 2021

sbjelogr commented Jun 29, 2021

operte commented Jun 30, 2021

timvink commented Aug 11, 2021

timvink commented Aug 24, 2021

operte commented Aug 25, 2021