Skip to content

Commit

Permalink
[MRG+1] DOC improve description and consistency of random_state (scik…
Browse files Browse the repository at this point in the history
…it-learn#8689)

* DOC improve description of random_state in train_test_split

* DOC Make random_state consistent through documentation

* FIX reverse doc mistake

* FIX address comment of Tom

* DOC address comments

* DOC remove empty line

* DOC remove unecessary white spaces
  • Loading branch information
glemaitre authored and jmschrei committed Apr 6, 2017
1 parent f14c07d commit e3c9ae2
Show file tree
Hide file tree
Showing 56 changed files with 694 additions and 358 deletions.
16 changes: 10 additions & 6 deletions sklearn/cluster/bicluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,9 +236,11 @@ class SpectralCoclustering(BaseSpectral):
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
are used.
random_state : int seed, RandomState instance, or None (default)
A pseudo random number generator used by the K-Means
initialization.
random_state : int, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
Attributes
----------
Expand Down Expand Up @@ -366,9 +368,11 @@ class SpectralBiclustering(BaseSpectral):
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
are used.
random_state : int seed, RandomState instance, or None (default)
A pseudo random number generator used by the K-Means
initialization.
random_state : int, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
Attributes
----------
Expand Down
54 changes: 30 additions & 24 deletions sklearn/cluster/k_means_.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,10 +230,11 @@ def k_means(X, n_clusters, init='k-means++', precompute_distances='auto',
verbose : boolean, optional
Verbosity mode.
random_state : integer or numpy.RandomState, optional
The generator used to initialize the centers. If an integer is
given, it fixes the seed. Defaults to the global numpy random
number generator.
random_state : int, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
copy_x : boolean, optional
When pre-computing distances it is more numerically accurate to center
Expand Down Expand Up @@ -449,10 +450,11 @@ def _kmeans_single_lloyd(X, n_clusters, max_iter=300, init='k-means++',
precompute_distances : boolean, default: True
Precompute distances (faster but takes more memory).
random_state : integer or numpy.RandomState, optional
The generator used to initialize the centers. If an integer is
given, it fixes the seed. Defaults to the global numpy random
number generator.
random_state : int, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
Returns
-------
Expand Down Expand Up @@ -638,10 +640,11 @@ def _init_centroids(X, k, init, random_state=None, x_squared_norms=None,
init : {'k-means++', 'random' or ndarray or callable} optional
Method for initialization
random_state : integer or numpy.RandomState, optional
The generator used to initialize the centers. If an integer is
given, it fixes the seed. Defaults to the global numpy random
number generator.
random_state : int, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
x_squared_norms : array, shape (n_samples,), optional
Squared euclidean norm of each data point. Pass it if you have it at
Expand Down Expand Up @@ -766,10 +769,11 @@ class KMeans(BaseEstimator, ClusterMixin, TransformerMixin):
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
are used.
random_state : integer or numpy.RandomState, optional
The generator used to initialize the centers. If an integer is
given, it fixes the seed. Defaults to the global numpy random
number generator.
random_state : int, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
verbose : int, default 0
Verbosity mode.
Expand Down Expand Up @@ -1008,10 +1012,11 @@ def _mini_batch_step(X, x_squared_norms, centers, counts,
the distances of each sample to its closest center.
May not be None when random_reassign is True.
random_state : integer or numpy.RandomState, optional
The generator used to initialize the centers. If an integer is
given, it fixes the seed. Defaults to the global numpy random
number generator.
random_state : int, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
random_reassign : boolean, optional
If True, centers with very low counts are randomly reassigned
Expand Down Expand Up @@ -1247,10 +1252,11 @@ class MiniBatchKMeans(KMeans):
Compute label assignment and inertia for the complete dataset
once the minibatch optimization has converged in fit.
random_state : integer or numpy.RandomState, optional
The generator used to initialize the centers. If an integer is
given, it fixes the seed. Defaults to the global numpy random
number generator.
random_state : int, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
reassignment_ratio : float, default: 0.01
Control the fraction of the maximum number of counts for a
Expand Down
7 changes: 5 additions & 2 deletions sklearn/cluster/mean_shift_.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,11 @@ def estimate_bandwidth(X, quantile=0.3, n_samples=None, random_state=0,
n_samples : int, optional
The number of samples to use. If not given, all samples are used.
random_state : int or RandomState
Pseudo-random number generator state used for random sampling.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
n_jobs : int, optional (default = 1)
The number of parallel jobs to run for neighbors search.
Expand Down
30 changes: 19 additions & 11 deletions sklearn/cluster/spectral.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,11 @@ def discretize(vectors, copy=True, max_svd_restarts=30, n_iter_max=20,
Maximum number of iterations to attempt in rotation and partition
matrix search if machine precision convergence is not reached
random_state : int seed, RandomState instance, or None (default)
A pseudo random number generator used for the initialization of the
of the rotation matrix
random_state : int, RandomState instance or None, optional, default: None
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
Returns
-------
Expand Down Expand Up @@ -194,10 +196,13 @@ def spectral_clustering(affinity, n_clusters=8, n_components=None,
to be installed. It can be faster on very large, sparse problems,
but may also lead to instabilities
random_state : int seed, RandomState instance, or None (default)
A pseudo random number generator used for the initialization
of the lobpcg eigen vectors decomposition when eigen_solver == 'amg'
and by the K-Means initialization.
random_state : int, RandomState instance or None, optional, default: None
A pseudo random number generator used for the initialization of the
lobpcg eigen vectors decomposition when eigen_solver == 'amg' and by
the K-Means initialization. If int, random_state is the seed used by
the random number generator; If RandomState instance, random_state is
the random number generator; If None, the random number generator is
the RandomState instance used by `np.random`.
n_init : int, optional, default: 10
Number of time the k-means algorithm will be run with different
Expand Down Expand Up @@ -326,10 +331,13 @@ class SpectralClustering(BaseEstimator, ClusterMixin):
to be installed. It can be faster on very large, sparse problems,
but may also lead to instabilities
random_state : int seed, RandomState instance, or None (default)
A pseudo random number generator used for the initialization
of the lobpcg eigen vectors decomposition when eigen_solver == 'amg'
and by the K-Means initialization.
random_state : int, RandomState instance or None, optional, default: None
A pseudo random number generator used for the initialization of the
lobpcg eigen vectors decomposition when eigen_solver == 'amg' and by
the K-Means initialization. If int, random_state is the seed used by
the random number generator; If RandomState instance, random_state is
the random number generator; If None, the random number generator is
the RandomState instance used by `np.random`.
n_init : int, optional, default: 10
Number of time the k-means algorithm will be run with different
Expand Down
33 changes: 20 additions & 13 deletions sklearn/covariance/robust_covariance.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,11 @@ def c_step(X, n_support, remaining_iterations=30, initial_estimates=None,
verbose : boolean, optional
Verbose mode.
random_state : integer or numpy.RandomState, optional
The random generator used. If an integer is given, it fixes the
seed. Defaults to the global numpy random number generator.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
cov_computation_method : callable, default empirical_covariance
The function which will be used to compute the covariance.
Expand Down Expand Up @@ -214,9 +216,11 @@ def select_candidates(X, n_support, n_trials, select=1, n_iter=30,
Maximum number of iterations for the c_step procedure.
(2 is enough to be close to the final solution. "Never" exceeds 20).
random_state : integer or numpy.RandomState, default None
The random generator used. If an integer is given, it fixes the
seed. Defaults to the global numpy random number generator.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
cov_computation_method : callable, default empirical_covariance
The function which will be used to compute the covariance.
Expand Down Expand Up @@ -311,10 +315,11 @@ def fast_mcd(X, support_fraction=None,
value of support_fraction will be used within the algorithm:
`[n_sample + n_features + 1] / 2`.
random_state : integer or numpy.RandomState, optional
The generator used to randomly subsample. If an integer is
given, it fixes the seed. Defaults to the global numpy random
number generator.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
cov_computation_method : callable, default empirical_covariance
The function which will be used to compute the covariance.
Expand Down Expand Up @@ -531,9 +536,11 @@ class MinCovDet(EmpiricalCovariance):
value of support_fraction will be used within the algorithm:
[n_sample + n_features + 1] / 2
random_state : integer or numpy.RandomState, optional
The random generator used. If an integer is given, it fixes the
seed. Defaults to the global numpy random number generator.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
Attributes
----------
Expand Down
53 changes: 35 additions & 18 deletions sklearn/cross_validation.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

"""
The :mod:`sklearn.cross_validation` module includes utilities for cross-
validation and performance evaluation.
Expand Down Expand Up @@ -297,9 +296,11 @@ class KFold(_BaseKFold):
shuffle : boolean, optional
Whether to shuffle the data before splitting into batches.
random_state : None, int or RandomState
When shuffle=True, pseudo-random number generator state used for
shuffling. If None, use default numpy RNG for shuffling.
random_state : int, RandomState instance or None, optional, default=None
If int, random_state is the seed used by the random number
generator; If RandomState instance, random_state is the random number
generator; If None, the random number generator is the RandomState
instance used by `np.random`. Used when ``shuffle`` == True.
Examples
--------
Expand Down Expand Up @@ -499,9 +500,11 @@ class StratifiedKFold(_BaseKFold):
Whether to shuffle each stratification of the data before splitting
into batches.
random_state : None, int or RandomState
When shuffle=True, pseudo-random number generator state used for
shuffling. If None, use default numpy RNG for shuffling.
random_state : int, RandomState instance or None, optional, default=None
If int, random_state is the seed used by the random number
generator; If RandomState instance, random_state is the random number
generator; If None, the random number generator is the RandomState
instance used by `np.random`. Used when ``shuffle`` == True.
Examples
--------
Expand Down Expand Up @@ -822,8 +825,11 @@ class ShuffleSplit(BaseShuffleSplit):
int, represents the absolute number of train samples. If None,
the value is automatically set to the complement of the test size.
random_state : int or RandomState
Pseudo-random number generator state used for random sampling.
random_state : int, RandomState instance or None, optional (default None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
Examples
--------
Expand Down Expand Up @@ -1031,8 +1037,11 @@ class StratifiedShuffleSplit(BaseShuffleSplit):
int, represents the absolute number of train samples. If None,
the value is automatically set to the complement of the test size.
random_state : int or RandomState
Pseudo-random number generator state used for random sampling.
random_state : int, RandomState instance or None, optional (default None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
Examples
--------
Expand Down Expand Up @@ -1225,8 +1234,11 @@ class LabelShuffleSplit(ShuffleSplit):
int, represents the absolute number of train labels. If None,
the value is automatically set to the complement of the test size.
random_state : int or RandomState
Pseudo-random number generator state used for random sampling.
random_state : int, RandomState instance or None, optional (default None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
"""
def __init__(self, labels, n_iter=5, test_size=0.2, train_size=None,
Expand Down Expand Up @@ -1889,9 +1901,11 @@ def permutation_test_score(estimator, X, y, cv=None,
Labels constrain the permutation among groups of samples with
a same label.
random_state : RandomState or an int seed (0 by default)
A random number generator instance to define the state of the
random permutations generator.
random_state : int, RandomState instance or None, optional (default=0)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
verbose : integer, optional
The verbosity level.
Expand Down Expand Up @@ -1977,8 +1991,11 @@ def train_test_split(*arrays, **options):
int, represents the absolute number of train samples. If None,
the value is automatically set to the complement of the test size.
random_state : int or RandomState
Pseudo-random number generator state used for random sampling.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
stratify : array-like or None (default is None)
If not None, data is split in a stratified fashion, using this as
Expand Down
8 changes: 5 additions & 3 deletions sklearn/datasets/olivetti_faces.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,11 @@ def fetch_olivetti_faces(data_home=None, shuffle=False, random_state=0,
If False, raise a IOError if the data is not locally available
instead of trying to download the data from the source site.
random_state : optional, integer or RandomState object
The seed or the random number generator used to shuffle the
data.
random_state : int, RandomState instance or None, optional (default=0)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
Returns
-------
Expand Down
7 changes: 5 additions & 2 deletions sklearn/datasets/samples_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -1059,8 +1059,11 @@ def make_sparse_coded_signal(n_samples, n_components, n_features,
n_nonzero_coefs : int
number of active (non-zero) coefficients in each sample
random_state : int or RandomState instance, optional (default=None)
seed used by the pseudo random number generator
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
Returns
-------
Expand Down
Loading

0 comments on commit e3c9ae2

Please sign in to comment.