-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Lambdarank and query deduplication/weighting #3422
Comments
I'm not sure if this is off topic or not, but happy to be told otherwise. You may want to look into click models for search, and consider how to make This project has many resources on click models: |
Hello, Thank you @lgrz but I'm feeling quite confident about my ranking feature engineering :D, but my problem here is to assert I'm optimizing the right thing. After looking at nDCG loss, setting linear gain to label_gain and no weights paramater it should works (should have checked before writing my first post). Created a custom_map5 function to assess this. Here is the code and the output atleast seems to work in the same direction.
Outputs :
They seems correlated (at least I know I'm optimizing in the right direction), but still don't understand why I'm not able to reproduce the same results between map@5 and custom_map@5. Any idea welcome! |
One thing to note is that AP is usually calculated with binary relevance judgments. Perhaps this is related to your issue? |
Might be off-topic, let me know if it is not the right place.
Problem description :
I want to create a non-personalized ranking for categories on an e-commerce website.
When using ranking, one query usually represents one user.
E.g. if we have 100 products in the same category :
However, if the final system is not personalized (same results for the same query), we would like to represent data more smartly e.g.:
Query “drill” had
By doing so we can have a dataset 3 times smaller while keeping the same amount of information. In this example the gain ratio is 3, but in e-commerce categories, it is way bigger.
I think my problem is similar to this issue, but I don’t understand what is the correct way to do.
I tried to play with :
Then tried to have the same metric MAP@5 results inside LGB (at training) than outside (by using LGB prediction, MAP@5 can be obtained such as : sum of clicks of observation ranked <= 5 / sum of clicks).
Any guidance on how to use LightGbm on such a problem is welcome. If you think such a question is more adapted for r/MachineLearning or stackoverflow, let me know.
The text was updated successfully, but these errors were encountered: