Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Lambdarank and query deduplication/weighting #3422

Open
jacquespeeters opened this issue Sep 30, 2020 · 3 comments
Open

[Discussion] Lambdarank and query deduplication/weighting #3422

jacquespeeters opened this issue Sep 30, 2020 · 3 comments
Labels

Comments

@jacquespeeters
Copy link

jacquespeeters commented Sep 30, 2020

Might be off-topic, let me know if it is not the right place.

Problem description :
I want to create a non-personalized ranking for categories on an e-commerce website.
When using ranking, one query usually represents one user.

E.g. if we have 100 products in the same category :

  • User_i make query “drill” then click on product_i, 99 others products are negative candidate
  • User_j make query “drill” then click on product_j, 99 others products are negative candidate
  • User_k make query “drill” then click on product_i, 99 others products are negative candidate

However, if the final system is not personalized (same results for the same query), we would like to represent data more smartly e.g.:
Query “drill” had

  • 2 clicks on product_i,
  • 1 click on product_j,
  • 0 click on the other 98 products (negative candidate)

By doing so we can have a dataset 3 times smaller while keeping the same amount of information. In this example the gain ratio is 3, but in e-commerce categories, it is way bigger.

I think my problem is similar to this issue, but I don’t understand what is the correct way to do.

I tried to play with :

  • label_gain parameter (assign linear gain 1, 2, 3, 4, ….)
  • weight parameter in multiple ways
    Then tried to have the same metric MAP@5 results inside LGB (at training) than outside (by using LGB prediction, MAP@5 can be obtained such as : sum of clicks of observation ranked <= 5 / sum of clicks).

Any guidance on how to use LightGbm on such a problem is welcome. If you think such a question is more adapted for r/MachineLearning or stackoverflow, let me know.

@lgrz
Copy link
Contributor

lgrz commented Sep 30, 2020

I'm not sure if this is off topic or not, but happy to be told otherwise.

You may want to look into click models for search, and consider how to make
them features for your ranking model, also it is closely related to a
sub-field called Online LTR.

This project has many resources on click models:
https://github.com/markovi/PyClick

@jacquespeeters
Copy link
Author

jacquespeeters commented Oct 5, 2020

Hello,

Thank you @lgrz but I'm feeling quite confident about my ranking feature engineering :D, but my problem here is to assert I'm optimizing the right thing.

After looking at nDCG loss, setting linear gain to label_gain and no weights paramater it should works (should have checked before writing my first post). Created a custom_map5 function to assess this.

Here is the code and the output atleast seems to work in the same direction.

def custom_map5(preds, lgb_dataset):
    labels = lgb_dataset.get_label()
    groups_size = lgb_dataset.get_group()
    begin = 0
    ranks = []
    for group_size in groups_size:
        preds_group = preds[begin : (begin + group_size)]  # noqa
        temp = (-preds_group).argsort()
        ranks_group = np.empty_like(temp)
        ranks_group[temp] = np.arange(len(preds_group))
        ranks.append(ranks_group)
        begin += group_size

    ranks = np.concatenate(ranks)
    is_top5 = ranks < 5
    map5 = labels[is_top5].sum() / labels.sum()
    return "custom_map@5", map5, True


lgb_train = lgb.Dataset(X_train, label=y_train, group=group_train)
lgb_valid = lgb.Dataset(X_valid, label=y_valid, group=group_valid)

param_ranking = {
    "objective": "lambdarank",
    "label_gain": [i for i in range(max(y_train.max(), y_valid.max()) + 1)],
    "metric": ["ndcg", "map"],
    "eval_at": 5,
    "random_state": 1,
    "verbosity": -1,
    # 'num_threads': 16,
    "learning_rate": 0.2,
}

model_gbm = lgb.train(
    param_ranking,
    lgb_train,
    2000,
    valid_sets=[lgb_train, lgb_valid],
    early_stopping_rounds=20,
    verbose_eval=10,
    feval=custom_map5,
)

df_valid["pred"] = model_gbm.predict(X_valid)
df_valid["pred_rank"] = df_valid.groupby(cols_group)["pred"].rank(ascending=False, method="first")
map5_valid = df_valid[df_valid["pred_rank"] <= 5]["click_cnt"].sum() / df_valid["click_cnt"].sum()

np.testing.assert_almost_equal(model_gbm.best_score["valid_1"]["custom_map@5"], map5_valid, decimal=3)

Outputs :

Training until validation scores don't improve for 20 rounds
[10]	training's ndcg@5: 0.815343	training's map@5: 0.783007	training's custom_map@5: 0.520511	valid_1's ndcg@5: 0.813017	valid_1's map@5: 0.779521	valid_1's custom_map@5: 0.523949
[20]	training's ndcg@5: 0.819935	training's map@5: 0.788001	training's custom_map@5: 0.522785	valid_1's ndcg@5: 0.815812	valid_1's map@5: 0.782607	valid_1's custom_map@5: 0.526175
[30]	training's ndcg@5: 0.822388	training's map@5: 0.790513	training's custom_map@5: 0.523917	valid_1's ndcg@5: 0.817068	valid_1's map@5: 0.783851	valid_1's custom_map@5: 0.527015
[40]	training's ndcg@5: 0.824651	training's map@5: 0.792752	training's custom_map@5: 0.524872	valid_1's ndcg@5: 0.817749	valid_1's map@5: 0.784574	valid_1's custom_map@5: 0.527479
[50]	training's ndcg@5: 0.826833	training's map@5: 0.795156	training's custom_map@5: 0.525806	valid_1's ndcg@5: 0.818177	valid_1's map@5: 0.785092	valid_1's custom_map@5: 0.527947
[60]	training's ndcg@5: 0.828554	training's map@5: 0.797114	training's custom_map@5: 0.526303	valid_1's ndcg@5: 0.818361	valid_1's map@5: 0.785287	valid_1's custom_map@5: 0.528142
[70]	training's ndcg@5: 0.830337	training's map@5: 0.799026	training's custom_map@5: 0.526703	valid_1's ndcg@5: 0.818588	valid_1's map@5: 0.785558	valid_1's custom_map@5: 0.52843
[80]	training's ndcg@5: 0.831949	training's map@5: 0.800502	training's custom_map@5: 0.527388	valid_1's ndcg@5: 0.818452	valid_1's map@5: 0.785409	valid_1's custom_map@5: 0.528522
Early stopping, best iteration is:
[69]	training's ndcg@5: 0.830104	training's map@5: 0.798771	training's custom_map@5: 0.526663	valid_1's ndcg@5: 0.8187	valid_1's map@5: 0.785716	valid_1's custom_map@5: 0.528426

They seems correlated (at least I know I'm optimizing in the right direction), but still don't understand why I'm not able to reproduce the same results between map@5 and custom_map@5. Any idea welcome!

@lgrz
Copy link
Contributor

lgrz commented Oct 8, 2020

One thing to note is that AP is usually calculated with binary relevance judgments. Perhaps this is related to your issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants