Skip to content

Commit

Permalink
The previous version used random rampling with duplicates for selecting
Browse files Browse the repository at this point in the history
examples from the dataset.
The algorithm idea states that it should use random
sampling of m examples without duplicates because duplicates cause
unstable results where the effect of some examples will be aplified and
some will be ignored.

Whereas since the current code set m to n_samples (all
examples), there's really no point in random sampling since you use all
examples anyway. The order of sample processing doesn't matter either, so
I removed all this random sampling bit and just let it loop through
results.

In future I can modify the algorithm to use another argument m and use a
random subset of examples properly (like stated in the article),
but for now I did it like this.

The algorithm is stable now.
  • Loading branch information
tadej committed Aug 16, 2016
1 parent f90b080 commit acca39c
Showing 1 changed file with 1 addition and 3 deletions.
4 changes: 1 addition & 3 deletions skfeature/function/similarity_based/reliefF.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import numpy as np
from random import randrange
from sklearn.metrics.pairwise import pairwise_distances


Expand Down Expand Up @@ -41,8 +40,7 @@ def reliefF(X, y, **kwargs):
score = np.zeros(n_features)

# the number of sampled instances is equal to the number of total instances
for iter in range(n_samples):
idx = randrange(0, n_samples, 1)
for idx in range(n_samples):
near_hit = []
near_miss = dict()

Expand Down

0 comments on commit acca39c

Please sign in to comment.