DOC spellfixes

g-walsh · Jun 29, 2013 · 8861833 · 8861833
1 parent 7c4126e
commit 8861833
Show file tree

Hide file tree

Showing 102 changed files with 229 additions and 229 deletions.
diff --git a/doc/datasets/labeled_faces.rst b/doc/datasets/labeled_faces.rst
@@ -97,7 +97,7 @@ possible to get an additional dimension with the RGB color channels by
 passing ``color=True``, in that case the shape will be
 ``(2200, 2, 62, 47, 3)``.
 
-The ``fetch_lfw_pairs`` datasets is subdived in 3 subsets: the development
+The ``fetch_lfw_pairs`` datasets is subdivided into 3 subsets: the development
 ``train`` set, the development ``test`` set and an evaluation ``10_folds``
 set meant to compute performance metrics using a 10-folds cross
 validation scheme.

diff --git a/doc/datasets/twenty_newsgroups.rst b/doc/datasets/twenty_newsgroups.rst
@@ -4,7 +4,7 @@ The 20 newsgroups text dataset
 ==============================
 
 The 20 newsgroups dataset comprises around 18000 newsgroups posts on
-20 topics splitted in two subsets: one for training (or development)
+20 topics split in two subsets: one for training (or development)
 and the other one for testing (or for performance evaluation). The split
 between the train and test set is based upon a messages posted before
 and after a specific date.

diff --git a/doc/developers/index.rst b/doc/developers/index.rst
@@ -241,7 +241,7 @@ Next, one or two small code examples to show its use can be added.
 Finally, any math and equations, followed by references,
 can be added to further the documentation. Not starting the
 documentation with the maths makes it more friendly towards
-users that are just intersted in what the feature will do, as
+users that are just interested in what the feature will do, as
 opposed to how it works `under the hood`.
 
 
@@ -454,7 +454,7 @@ in an attribute ``random_state``.
 ``fit`` can call ``check_random_state`` on that attribute
 to get an actual random number generator.
 If, for some reason, randomness is needed after ``fit``,
-the RNG should be stored in an attibute ``random_state_``.
+the RNG should be stored in an attribute ``random_state_``.
 The following example should make this clear::
 
     class GaussianNoise(BaseEstimator, TransformerMixin):
@@ -541,7 +541,7 @@ integer division is written ``//``.
 String handling has been overhauled, though, as have parts of
 the Python standard library.
 The `six <http://pythonhosted.org/six/>`_ package helps with
-cross-compability and is included in scikit-learn as
+cross-compatibility and is included in scikit-learn as
 ``sklearn.externals.six``.
 
 

diff --git a/doc/developers/performance.rst b/doc/developers/performance.rst
@@ -16,7 +16,7 @@ code for the scikit-learn project.
   implementation optimization.
 
   Times and times, hours of efforts invested in optimizing complicated
-  implementation details have been rended irrelevant by the late discovery
+  implementation details have been rendered irrelevant by the late discovery
   of simple **algorithmic tricks**, or by using another algorithm altogether
   that is better suited to the problem.
 
@@ -359,7 +359,7 @@ important in practice on the existing cython codebase in the scikit-learn
 project.
 
 TODO: html report, type declarations, bound checks, division by zero checks,
-memory alignement, direct blas calls...
+memory alignment, direct blas calls...
 
 - http://www.euroscipy.org/file/3696?vid=download
 - http://conference.scipy.org/proceedings/SciPy2009/paper_1/
@@ -373,7 +373,7 @@ Profiling compiled extensions
 
 When working with compiled extensions (written in C/C++ with a wrapper or
 directly as Cython extension), the default Python profiler is useless:
-we need a dedicated tool to instrospect what's happening inside the
+we need a dedicated tool to introspect what's happening inside the
 compiled extension it-self.
 
 Using yep and google-perftools

diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
@@ -250,7 +250,7 @@ is given.
 
 Affinity Propagation can be interesting as it chooses the number of
 clusters based on the data provided. For this purpose, the two important
-parameters are the `preference`, which controls how many examplars are
+parameters are the `preference`, which controls how many exemplars are
 used, and the `damping` factor.
 
 The main drawback of Affinity Propagation is its complexity. The
@@ -384,7 +384,7 @@ Different label assignment strategies
 
 Different label assignment strategies can be used, corresponding to the
 `assign_labels` parameter of :class:`SpectralClustering`.
-The `kmeans` strategie can match finer details of the data, but it can be
+The `kmeans` strategy can match finer details of the data, but it can be
 more unstable. In particular, unless you control the `random_state`, it
 may not be reproducible from run-to-run, as it depends on a random
 initialization. On the other hand, the `discretize` strategy is 100%
@@ -933,14 +933,14 @@ Their harmonic mean called **V-measure** is computed by
 The V-measure is actually equivalent to the mutual information (NMI)
 discussed above normalized by the sum of the label entropies [B2011]_.
 
-Homogeneity, completensess and V-measure can be computed at once using
+Homogeneity, completeness and V-measure can be computed at once using
 :func:`homogeneity_completeness_v_measure` as follows::
 
   >>> metrics.homogeneity_completeness_v_measure(labels_true, labels_pred)
   ...                                                      # doctest: +ELLIPSIS
   (0.66..., 0.42..., 0.51...)
 
-The following clustering assignment is slighlty better, since it is
+The following clustering assignment is slightly better, since it is
 homogeneous but not complete::
 
   >>> labels_pred = [0, 0, 0, 1, 2, 2]
@@ -966,7 +966,7 @@ Advantages
 
 - Intuitive interpretation: clustering with bad V-measure can be
   **qualitatively analyzed in terms of homogeneity and completeness**
-  to better feel what 'kind' of mistakes is done by the assigmenent.
+  to better feel what 'kind' of mistakes is done by the assignment.
 
 - **No assumption is made on the cluster structure**: can be used
   to compare clustering algorithms such as k-means which assumes isotropic

diff --git a/doc/modules/cross_validation.rst b/doc/modules/cross_validation.rst
@@ -97,7 +97,7 @@ The simplest way to use perform cross-validation in to call the
 :func:`cross_val_score` helper function on the estimator and the dataset.
 
 The following example demonstrates how to estimate the accuracy of a
-linear kernel support vector machine on the iris dataset by splitting
+linear kernel support vector machine on the iris dataset by split
 the data and fitting a model and computing the score 5 consecutive times
 (with different splits each time)::
 
@@ -132,7 +132,7 @@ When the ``cv`` argument is an integer, :func:`cross_val_score` uses the
 :class:`KFold` or :class:`StratifiedKFold` strategies by default (depending on
 the absence or presence of the target array).
 
-It is also possible to use othe cross validation strategies by passing a cross
+It is also possible to use other cross validation strategies by passing a cross
 validation iterator instead, for instance::
 
   >>> n_samples = iris.data.shape[0]
@@ -369,7 +369,7 @@ Random permutations cross-validation a.k.a. Shuffle & Split
 
 The :class:`ShuffleSplit` iterator will generate a user defined number of
 independent train / test dataset splits. Samples are first shuffled and
-then splitted into a pair of train and test sets.
+then split into a pair of train and test sets.
 
 It is possible to control the randomness for reproducibility of the
 results by explicitly seeding the ``random_state`` pseudo random number

diff --git a/doc/modules/decomposition.rst b/doc/modules/decomposition.rst
@@ -510,11 +510,11 @@ structure of the error covariance :math:`\Psi`:
   :class:`ProbabilisticPCA`.
 
 * :math:`\Psi = diag(\psi_1, \psi_2, \dots, \psi_n)`: This model is called Factor
-  Analysis, a classical statistical model. The matrix W is sometimtes called
+  Analysis, a classical statistical model. The matrix W is sometimes called
   `factor loading matrix`.
 
 Both model essentially estimate a Gaussian with a low-rank covariance matrix.
-Because both models are probilistic they can be integrated in more complex
+Because both models are probabilistic they can be integrated in more complex
 models, e.g. Mixture of Factor Analysers. One gets very different models (e.g.
 :class:`FastICA`) if non-Gaussian priors on the latent variables are assumed.
 

diff --git a/doc/modules/dp-derivation.rst b/doc/modules/dp-derivation.rst
@@ -287,7 +287,7 @@ distributions of :math:`\sigma` (as there are a lot more :math:`\sigma` s
 now) and :math:`X`.
 
 The bound for :math:`\sigma_{k,d}` is the same bound for :math:`\sigma_k` and can
-be safelly omitted.
+be safely omitted.
 
 **The bound for** :math:`X` :
 

diff --git a/doc/modules/ensemble.rst b/doc/modules/ensemble.rst
@@ -235,7 +235,7 @@ the transformation performs an implicit, non-parametric density estimation.
  * :ref:`example_ensemble_plot_random_forest_embedding.py`
 
  * :ref:`example_manifold_plot_lle_digits.py` compares non-linear
-   dimensionality reduction technics on handwritten digits.
+   dimensionality reduction techniques on handwritten digits.
 
 .. seealso::
 
@@ -430,7 +430,7 @@ with least squares loss and 500 base learners to the Boston house price dataset
 The plot on the left shows the train and test error at each iteration.
 The train error at each iteration is stored in the
 :attr:`~GradientBoostingRegressor.train_score_` attribute
-of the gradient boosting model. The test error at each iterations can be optained
+of the gradient boosting model. The test error at each iterations can be obtained
 via the :meth:`~GradientBoostingRegressor.staged_predict` method which returns a
 generator that yields the predictions at each stage. Plots like these can be used
 to determine the optimal number of trees (i.e. ``n_estimators``) by early stopping.
@@ -684,7 +684,7 @@ interactions among the two features. For example, the two-variable PDP in the
 above Figure shows the dependence of median house price on joint
 values of house age and avg. occupants per household. We can clearly
 see an interaction between the two features:
-For an avg. occupancy greather than two, the house price is nearly independent
+For an avg. occupancy greater than two, the house price is nearly independent
 of the house age, whereas for values less than two there is a strong dependence
 on age.
 

diff --git a/doc/modules/feature_extraction.rst b/doc/modules/feature_extraction.rst
@@ -234,7 +234,7 @@ In order to address this, scikit-learn provides utilities for the most
 common ways to extract numerical features from text content, namely:
 
 - **tokenizing** strings and giving an integer id for each possible token,
-  for instance by using whitespaces and punctuation as token separators.
+  for instance by using white-spaces and punctuation as token separators.
 
 - **counting** the occurrences of tokens in each document.
 
@@ -253,7 +253,7 @@ A corpus of documents can thus be represented by a matrix with one row
 per document and one column per token (e.g. word) occurring in the corpus.
 
 We call **vectorization** the general process of turning a collection
-of text documents into numerical feature vectors. This specific stragegy
+of text documents into numerical feature vectors. This specific strategy
 (tokenization, counting and normalization) is called the **Bag of Words**
 or "Bag of n-grams" representation. Documents are described by word
 occurrences while completely ignoring the relative position information
@@ -348,7 +348,7 @@ ignored in future calls to the transform method::
 
 Note that in the previous corpus, the first and the last documents have
 exactly the same words hence are encoded in equal vectors. In particular
-we lose the information that the last document is an interogative form. To
+we lose the information that the last document is an interrogative form. To
 preserve some of the local ordering information we can extract 2-grams
 of words in addition to the 1-grams (the word themselvs)::
 
@@ -370,7 +370,7 @@ can now resolve ambiguities encoded in local positioning patterns::
          [0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1]]...)
 
 
-In particular the interogative form "Is this" is only present in the
+In particular the interrogative form "Is this" is only present in the
 last document::
 
   >>> feature_index = bigram_vectorizer.vocabulary_.get(u'is this')
@@ -384,7 +384,7 @@ Tf–idf term weighting
 ---------------------
 
 In a large text corpus, some words will be very present (e.g. "the", "a",
-"is" in English) hence carrying very little meaningul information about
+"is" in English) hence carrying very little meaningful information about
 the actual contents of the document. If we were to feed the direct count
 data directly to a classifier those very frequent terms would shadow
 the frequencies of rarer yet more interesting terms.
@@ -507,7 +507,7 @@ unigrams (n=1), one might prefer a collection of bigrams (n=2), where
 occurrences of pairs of consecutive words are counted.
 
 One might alternatively consider a collection of character n-grams, a
-representation resiliant against misspellings and derivations.
+representation resilient against misspellings and derivations.
 
 For example, let's say we're dealing with a corpus of two documents:
 ``['words', 'wprds']``. The second document contains a misspelling
@@ -548,7 +548,7 @@ span across words::
   [u'jumpy', u'mpy f', u'py fo', u'umpy ', u'y fox']
 
 The word boundaries-aware variant ``char_wb`` is especially interesting
-for languages that use whitespaces for word separation as it generates
+for languages that use white-spaces for word separation as it generates
 significantly less noisy features than the raw ``char`` variant in
 that case. For such languages it can increase both the predictive
 accuracy and convergence speed of classifiers trained using such

diff --git a/doc/modules/feature_selection.rst b/doc/modules/feature_selection.rst
@@ -121,7 +121,7 @@ alpha parameter, the fewer features selected.
 
    For a good choice of alpha, the :ref:`lasso` can fully recover the
    exact set of non-zero variables using only few observations, provided
-   certain specific conditions are met. In paraticular, the number of
+   certain specific conditions are met. In particular, the number of
    samples should be "sufficiently large", or L1 models will perform at
    random, where "sufficiently large" depends on the number of non-zero
    coefficients, the logarithm of the number of features, the amount of

diff --git a/doc/modules/gaussian_process.rst b/doc/modules/gaussian_process.rst
@@ -19,7 +19,7 @@ The advantages of Gaussian Processes for Machine Learning are:
       correlation models).
 
     - The prediction is probabilistic (Gaussian) so that one can compute
-      empirical confidence intervals and exceedence probabilities that might be
+      empirical confidence intervals and exceedance probabilities that might be
       used to refit (online fitting, adaptive fitting) the prediction in some
       region of interest.
 

diff --git a/doc/modules/grid_search.rst b/doc/modules/grid_search.rst
@@ -217,7 +217,7 @@ of the training set is left out.
 
 This left out portion can be used to estimate the generalization error
 without having to rely on a separate validation set.  This estimate
-comes "for free" as no addictional data is needed and can be used for
+comes "for free" as no additional data is needed and can be used for
 model selection.
 
 This is currently implemented in the following classes:

diff --git a/doc/modules/hmm.rst b/doc/modules/hmm.rst
@@ -82,7 +82,7 @@ constructor. Then, you can generate samples from the HMM by calling `sample`.::
 
  * :ref:`example_plot_hmm_sampling.py`
 
-Training HMM parameters and infering the hidden states
+Training HMM parameters and inferring the hidden states
 ------------------------------------------------------
 
 You can train an HMM by calling the `fit` method. The input is "the list" of 

diff --git a/doc/modules/kernel_approximation.rst b/doc/modules/kernel_approximation.rst
@@ -79,7 +79,7 @@ function does not actually depend on the data given to the ``fit`` function.
 Only the dimensionality of the data is used.
 Details on the method can be found in [RR2007]_.
 
-For a given value of ``n_components`` :class:`RBFSampler` is often less acurate
+For a given value of ``n_components`` :class:`RBFSampler` is often less accurate
 as :class:`Nystroem`. :class:`RBFSampler` is cheaper to compute, though, making
 use of larger feature spaces more efficient.
 

diff --git a/doc/modules/linear_model.rst b/doc/modules/linear_model.rst
@@ -224,7 +224,7 @@ cross-validation: :class:`LassoCV` and :class:`LassoLarsCV`.
 explained below.
 
 For high-dimensional datasets with many collinear regressors,
-:class:`LassoCV` is most often preferrable. How, :class:`LassoLarsCV` has
+:class:`LassoCV` is most often preferable. How, :class:`LassoLarsCV` has
 the advantage of exploring more relevant values of `alpha` parameter, and
 if the number of samples is very small compared to the number of
 observations, it is often faster than :class:`LassoCV`.
@@ -432,7 +432,7 @@ the residual.
 
 Instead of giving a vector result, the LARS solution consists of a
 curve denoting the solution for each value of the L1 norm of the
-parameter vector. The full coeffients path is stored in the array
+parameter vector. The full coefficients path is stored in the array
 ``coef_path_``, which has size (n_features, max_features+1). The first
 column is always zero.
 
@@ -617,7 +617,7 @@ centered on zero and with a precision :math:`\lambda_{i}`:
 
 with :math:`diag \; (A) = \lambda = \{\lambda_{1},...,\lambda_{p}\}`.
 
-In constrast to `Bayesian Ridge Regression`_, each coordinate of :math:`w_{i}`
+In contrast to `Bayesian Ridge Regression`_, each coordinate of :math:`w_{i}`
 has its own standard deviation :math:`\lambda_i`. The prior over all
 :math:`\lambda_i` is chosen to be the same gamma distribution given by
 hyperparameters :math:`\lambda_1` and :math:`\lambda_2`.

diff --git a/doc/modules/manifold.rst b/doc/modules/manifold.rst
@@ -202,7 +202,7 @@ of neighbors is greater than the number of input dimensions, the matrix
 defining each local neighborhood is rank-deficient.  To address this, standard
 LLE applies an arbitrary regularization parameter :math:`r`, which is chosen
 relative to the trace of the local weight matrix.  Though it can be shown
-formally that as :math:`r \to 0`, the solution coverges to the desired
+formally that as :math:`r \to 0`, the solution converges to the desired
 embedding, there is no guarantee that the optimal solution will be found
 for :math:`r > 0`.  This problem manifests itself in embeddings which distort
 the underlying geometry of the manifold.
@@ -407,7 +407,7 @@ countries.
 
 There exists two types of MDS algorithm: metric and non metric. In the
 scikit-learn, the class :class:`MDS` implements both. In Metric MDS, the input
-simiarity matrix arises from a metric (and thus respects the triangular
+similarity matrix arises from a metric (and thus respects the triangular
 inequality), the distances between output two points are then set to be as
 close as possible to the similarity or dissimilarity data. In the non metric
 vision, the algorithms will try to preserve the order of the distances, and

diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
@@ -344,7 +344,7 @@ The  `F-measure <http://en.wikipedia.org/wiki/F1_score>`_
 harmonic mean of the precision and recall. A
 :math:`F_\beta` measure reaches its best value at 1 and worst score at 0.
 With :math:`\beta = 1`, the :math:`F_\beta` measure leads to the
-:math:`F_1` measure, wheres the recall and the precsion are equally important.
+:math:`F_1` measure, wheres the recall and the precision are equally important.
 
 Several functions allow you to analyze the precision, recall and F-measures
 score:

diff --git a/doc/modules/multiclass.rst b/doc/modules/multiclass.rst
@@ -170,7 +170,7 @@ Example::
 
 .. topic:: References:
 
-    .. [1] "Solving multiclass learning problems via error-correcting ouput codes",
+    .. [1] "Solving multiclass learning problems via error-correcting output codes",
         Dietterich T., Bakiri G.,
         Journal of Artificial Intelligence Research 2,
         1995.

diff --git a/doc/modules/outlier_detection.rst b/doc/modules/outlier_detection.rst
@@ -65,7 +65,7 @@ but regular, observation outside the frontier.
 
 .. topic:: Examples:
 
-   * See :ref:`example_svm_plot_oneclass.py` for vizualizing the
+   * See :ref:`example_svm_plot_oneclass.py` for visualizing the
      frontier learned around some data by a
      :class:`svm.OneClassSVM` object.
 
@@ -80,7 +80,7 @@ Outlier Detection
 
 Outlier detection is similar to novelty detection in the sense that
 the goal is to separate a core of regular observations from some
-polutting ones, called "outliers". Yet, in the case of outlier
+polluting ones, called "outliers". Yet, in the case of outlier
 detection, we don't have a clean data set representing the population
 of regular observations that can be used to train any tool.