You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-17645][MLLIB][ML][FOLLOW-UP] document minor change
## What changes were proposed in this pull request?
Add FDR test case in ml/feature/ChiSqSelectorSuite.
Improve some comments in the code.
This is a follow-up pr for apache#15212.
## How was this patch tested?
ut
Author: Peng, Meng <[email protected]>
Closesapache#16434 from mpjlu/fdr_fwe_update.
Copy file name to clipboardexpand all lines: docs/ml-features.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -1426,9 +1426,9 @@ categorical features. ChiSqSelector uses the
1426
1426
features to choose. It supports five selection methods: `numTopFeatures`, `percentile`, `fpr`, `fdr`, `fwe`:
1427
1427
*`numTopFeatures` chooses a fixed number of top features according to a chi-squared test. This is akin to yielding the features with the most predictive power.
1428
1428
*`percentile` is similar to `numTopFeatures` but chooses a fraction of all features instead of a fixed number.
1429
-
*`fpr` chooses all features whose p-value is below a threshold, thus controlling the false positive rate of selection.
1429
+
*`fpr` chooses all features whose p-values are below a threshold, thus controlling the false positive rate of selection.
1430
1430
*`fdr` uses the [Benjamini-Hochberg procedure](https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure) to choose all features whose false discovery rate is below a threshold.
1431
-
*`fwe` chooses all features whose p-values is below a threshold, thus controlling the family-wise error rate of selection.
1431
+
*`fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by 1/numFeatures, thus controlling the family-wise error rate of selection.
1432
1432
By default, the selection method is `numTopFeatures`, with the default number of top features set to 50.
1433
1433
The user can choose a selection method using `setSelectorType`.
Copy file name to clipboardexpand all lines: docs/mllib-feature-extraction.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -231,9 +231,9 @@ features to choose. It supports five selection methods: `numTopFeatures`, `perce
231
231
232
232
*`numTopFeatures` chooses a fixed number of top features according to a chi-squared test. This is akin to yielding the features with the most predictive power.
233
233
*`percentile` is similar to `numTopFeatures` but chooses a fraction of all features instead of a fixed number.
234
-
*`fpr` chooses all features whose p-value is below a threshold, thus controlling the false positive rate of selection.
234
+
*`fpr` chooses all features whose p-values are below a threshold, thus controlling the false positive rate of selection.
235
235
*`fdr` uses the [Benjamini-Hochberg procedure](https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure) to choose all features whose false discovery rate is below a threshold.
236
-
*`fwe` chooses all features whose p-values is below a threshold, thus controlling the family-wise error rate of selection.
236
+
*`fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by 1/numFeatures, thus controlling the family-wise error rate of selection.
237
237
238
238
By default, the selection method is `numTopFeatures`, with the default number of top features set to 50.
239
239
The user can choose a selection method using `setSelectorType`.
0 commit comments