Fix incosistent results between leaderboard and logs

reczoo · Mar 19, 2022 · e8e723c · e8e723c
1 parent 1dc488b
commit e8e723c
Show file tree

Hide file tree

Showing 84 changed files with 5,107 additions and 2,799 deletions.
diff --git a/candidate_matching/datasets/Amazon/README.md b/candidate_matching/datasets/Amazon/README.md
@@ -1,11 +1,13 @@
+## Amazon
 
+### AmazonBooks_m1
 
-### Amazonbooks_x1
-We use this dataset following the same data splitting and preprocessing as in LightGCN. We download the preprocessed data from [here](https://github.com/kuandeng/LightGCN/tree/master/Data/amazon-book). The data statistics are summarized as follows:
+The dataset follows the same data splitting and preprocessing with LightGCN. The preprocessed data can be [downloaded from here](https://github.com/kuandeng/LightGCN/tree/master/Data/amazon-book). The data statistics are summarized as follows:
 
-| Dataset ID               | #Users | #Items | #Interactions |   #Train  |  #Test  | Density |
+| Dataset ID     | #Users | #Items | #Interactions |   #Train  |  #Test  | Density |
 |:--------------:|:------:|:------:|:-------------:|:---------:|:-------:|:-------:|
-| amazonbooks_x1 | 52,643 | 91,599 |   2,984,108   | 2,380,730 | 603,378 | 0.00062 |
+| AmazonBooks_x1 | 52,643 | 91,599 |   2,984,108   | 2,380,730 | 603,378 | 0.00062 |
+
 
 
 

diff --git a/candidate_matching/datasets/Amazon/amazonbooks_x1/ENMF_data_process.py b/candidate_matching/datasets/Amazon/amazonbooks_x1/ENMF_data_process.py
diff --git a/candidate_matching/datasets/Amazon/amazonbooks_x1/LR_GCCF_data_process.py b/candidate_matching/datasets/Amazon/amazonbooks_x1/LR_GCCF_data_process.py
diff --git a/candidate_matching/datasets/Gowalla/README.md b/candidate_matching/datasets/Gowalla/README.md
@@ -1,8 +1,8 @@
+### Gowalla_m1
 
-### Gowalla_x1
-We use this dataset following the same data splitting and preprocessing as in [NGCF](https://github.com/xiangwang1223/neural_graph_collaborative_filtering) and [LightGCN](https://github.com/kuandeng/LightGCN). We download the preprocessed data from [here](https://github.com/kuandeng/LightGCN/tree/master/Data). The data statistics are summarized as follows:
+The dataset follows the same data splitting and preprocessing with [NGCF](https://github.com/xiangwang1223/neural_graph_collaborative_filtering) and [LightGCN](https://github.com/kuandeng/LightGCN). The preprocessed data can be [downloaded from here](https://github.com/kuandeng/LightGCN/tree/master/Data). The data statistics are summarized as follows:
 
 | Dataset ID         | #Users | #Items | #Interactions |   #Train  |  #Test  | Density |
 |:--------------:|:------:|:------:|:-------------:|:---------:|:-------:|:-------:|
-| gowalla_x1 | 29,858  | 40,981  |  1,027,370    | 810,128  | 217,242  |  0.00084  |
+| Gowalla_m1 | 29,858  | 40,981  |  1,027,370    | 810,128  | 217,242  |  0.00084  |
 
diff --git a/candidate_matching/datasets/Gowalla/gowalla_x1/ENMF_data_process.py b/candidate_matching/datasets/Gowalla/gowalla_x1/ENMF_data_process.py
diff --git a/candidate_matching/datasets/Gowalla/gowalla_x1/LR_GCCF_data_process.py b/candidate_matching/datasets/Gowalla/gowalla_x1/LR_GCCF_data_process.py
diff --git a/candidate_matching/datasets/README.md b/candidate_matching/datasets/README.md
@@ -1,19 +1,22 @@
-## Datasets
+## BARS-Matching Datasets
+
+### Reusable Data Splits
+
+| Dataset           | Dataset ID           | Used by           | Scenarios                          |
+|-------------------|----------------------|:-----------------|:-----------------------------------|
+| Amazon            | [AmazonBooks_m1](./Amazon#AmazonBooks_m1)       |   [LightGCN, SIGIR'20](https://github.com/kuandeng/LightGCN/tree/master/Data/amazon-book)  | CF, GNN |
+|                   | AmazonBooks_m2       |   [ComiRec, KDD'20](https://github.com/THUDM/ComiRec)  | Multi-interest, Sequential |
+|                   | AmazonCDs_m1         |   [BGCF, KDD'20](https://dl.acm.org/doi/abs/10.1145/3394486.3403254)    | CF, GNN | 
+|                   | AmazonMovies_m1      |   [BGCF, KDD'20](https://dl.acm.org/doi/abs/10.1145/3394486.3403254)            | CF, GNN |
+|                   | AmazonBeauty_m1      |   [BGCF, KDD'20](https://dl.acm.org/doi/abs/10.1145/3394486.3403254)              | CF, GNN | 
+|                   | AmazonElectronics_m1 |   [NBPO, SIGIR'20](https://github.com/Wenhui-Yu/NBPO/tree/master/dataset/amazon)  | CF | 
+| Yelp              | [Yelp18_m1](./Yelp#Yelp18_m1)            |   [LightGCN, SIGIR'20](https://github.com/kuandeng/LightGCN/tree/master/Data/yelp2018)  | CF, GNN |
+| Gowalla           | [Gowalla_m1](./Gowalla#Gowalla_m1)           |   [LightGCN, SIGIR'20](https://github.com/kuandeng/LightGCN/tree/master/Data/gowalla)  | CF, GNN |
+| MovieLens         | MovieLens1M_m1       |   [LCFN, ICML'20](https://github.com/Wenhui-Yu/LCFN/tree/master/dataset/Movielens)               | CF, GNN |
+|                   | MovieLens1M_m2       |   [NCF, WWW'17](https://github.com/hexiangnan/neural_collaborative_filtering/tree/master/Data)                | CF |
+| CiteULike-A       | Citeulikea_m1        |   [DHCF](https://github.com/chenchongthu/ENMF#4-dhcf-kdd-2020dual-channel-hypergraph-collaborative-filtering) | CF, GNN | 
+| Taobao            | Taobao_m1            |   [ComiRec, KDD'20](https://github.com/THUDM/ComiRec) | Multi-interest, Sequential |
+| KuaiShou          | Kuaishou_m1          |   [NGAT4Rec, Arxiv'21](https://github.com/ShortVideoRecommendation/NGAT4Rec/tree/master/Data/kuaishou) | CF, GNN | 
+
 
-The following datasets are available for benchmarking.
 
-| Dataset           | Dataset_ID           | Contain features? | Description                                                           |
-|-------------------|----------------------|:-----------------:|-----------------------------------------------------------------------|
-| AmazonBooks       | amazonbooks_x1       |         NO        | The preprocessed data is provided in [LightGCN](https://github.com/kuandeng/LightGCN/tree/master/Data/amazon-book).  |
-|                   | amazonbooks_x2       |         YES        | The preprocessed data is provided in [ComiRec](https://github.com/THUDM/ComiRec).  |
-| Yelp18            | yelp18_x1            |         NO        | The preprocessed data is provided in [LightGCN](https://github.com/kuandeng/LightGCN/tree/master/Data/yelp2018).  |
-| Gowalla           | gowalla_x1           |         NO        | The preprocessed data is provided in [LightGCN](https://github.com/kuandeng/LightGCN/tree/master/Data/gowalla).  |
-| Movielens1M       | movielens1m_x1       |         NO        | The preprocessed data is provided in [LCFN](https://github.com/Wenhui-Yu/LCFN/tree/master/dataset/Movielens).               |
-|                   | movielens1m_x2       |         NO        | The preprocessed data is provided in NCF.                |
-| Movielens10M      |                      |                   | TODO                                                                  |
-| AmazonCDs       | amazoncds_x1        |         NO        | The preprocessed data is provided in BGCF.               |
-| AmazonMovies       | amazonmovies_x1        |         NO        | The preprocessed data is provided in BGCF.               |
-| AmazonBeauty       | amazonbeauty_x1        |         NO        | The preprocessed data is provided in BGCF.               |
-| AmazonElectronics | amazonelectronics_x1 |         NO        | The preprocessed data is provided in [NBPO](https://github.com/Wenhui-Yu/NBPO/tree/master/dataset/amazon).               |
-| CiteULike-A       | citeulikea_x1        |         NO        | The preprocessed data is provided in [DHCF](https://github.com/chenchongthu/ENMF#4-dhcf-kdd-2020dual-channel-hypergraph-collaborative-filtering).               |
-| Taobao            | taobao_x1       |        YES        | The preprocessed data is provided in [ComiRec](https://github.com/THUDM/ComiRec).
diff --git a/candidate_matching/datasets/Yelp/README.md b/candidate_matching/datasets/Yelp/README.md
@@ -1,7 +1,9 @@
+# Yelp
 
-### Yelp18_x1
-We use this dataset following the same data splitting and preprocessing as in LightGCN. We download the preprocessed data from [here](https://github.com/kuandeng/LightGCN/tree/master/Data). The data statistics are summarized as follows:
+### Yelp18_m1
+
+The dataset follows the same data splitting and preprocessing with LightGCN. The preprocessed data can be [downloaded from here](https://github.com/kuandeng/LightGCN/tree/master/Data). The data statistics are summarized as follows:
 
 | Dataset ID          | #Users | #Items | #Interactions |  #Train   |  #Test  | Density |
 | :-------: | :----: | :----: | :-----------: | :-------: | :-----: | :-----: |
-| yelp18_x1 | 31,668 | 38,048 |   1,561,406   | 1,237,259 | 324,147 | 0.00130 |
+| Yelp18_x1 | 31,668 | 38,048 |   1,561,406   | 1,237,259 | 324,147 | 0.00130 |
diff --git a/candidate_matching/datasets/Yelp/yelp18_x1/ENMF_data_process.py b/candidate_matching/datasets/Yelp/yelp18_x1/ENMF_data_process.py