The previous version used random rampling with duplicates for selecting

examples from the dataset. The algorithm idea states that it should use random sampling of m examples without duplicates because duplicates cause unstable results where the effect of some examples will be aplified and some will be ignored. Whereas since the current code set m to n_samples (all examples), there's really no point in random sampling since you use all examples anyway. The order of sample processing doesn't matter either, so I removed all this random sampling bit and just let it loop through results. In future I can modify the algorithm to use another argument m and use a random subset of examples properly (like stated in the article), but for now I did it like this. The algorithm is stable now.
largeapp · Aug 16, 2016 · acca39c · acca39c
1 parent f90b080
commit acca39c
Showing 1 changed file with 1 addition and 3 deletions.
diff --git a/skfeature/function/similarity_based/reliefF.py b/skfeature/function/similarity_based/reliefF.py
@@ -1,5 +1,4 @@
 import numpy as np
-from random import randrange
 from sklearn.metrics.pairwise import pairwise_distances
 
 
@@ -41,8 +40,7 @@ def reliefF(X, y, **kwargs):
     score = np.zeros(n_features)
 
     # the number of sampled instances is equal to the number of total instances
-    for iter in range(n_samples):
-        idx = randrange(0, n_samples, 1)
+    for idx in range(n_samples):
         near_hit = []
         near_miss = dict()