Skip to content

Commit

Permalink
DOC spellfixes
Browse files Browse the repository at this point in the history
  • Loading branch information
jaquesgrobler authored and vene committed Jun 29, 2013
1 parent 7c4126e commit 8861833
Show file tree
Hide file tree
Showing 102 changed files with 229 additions and 229 deletions.
2 changes: 1 addition & 1 deletion doc/datasets/labeled_faces.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ possible to get an additional dimension with the RGB color channels by
passing ``color=True``, in that case the shape will be
``(2200, 2, 62, 47, 3)``.

The ``fetch_lfw_pairs`` datasets is subdived in 3 subsets: the development
The ``fetch_lfw_pairs`` datasets is subdivided into 3 subsets: the development
``train`` set, the development ``test`` set and an evaluation ``10_folds``
set meant to compute performance metrics using a 10-folds cross
validation scheme.
Expand Down
2 changes: 1 addition & 1 deletion doc/datasets/twenty_newsgroups.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The 20 newsgroups text dataset
==============================

The 20 newsgroups dataset comprises around 18000 newsgroups posts on
20 topics splitted in two subsets: one for training (or development)
20 topics split in two subsets: one for training (or development)
and the other one for testing (or for performance evaluation). The split
between the train and test set is based upon a messages posted before
and after a specific date.
Expand Down
6 changes: 3 additions & 3 deletions doc/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ Next, one or two small code examples to show its use can be added.
Finally, any math and equations, followed by references,
can be added to further the documentation. Not starting the
documentation with the maths makes it more friendly towards
users that are just intersted in what the feature will do, as
users that are just interested in what the feature will do, as
opposed to how it works `under the hood`.


Expand Down Expand Up @@ -454,7 +454,7 @@ in an attribute ``random_state``.
``fit`` can call ``check_random_state`` on that attribute
to get an actual random number generator.
If, for some reason, randomness is needed after ``fit``,
the RNG should be stored in an attibute ``random_state_``.
the RNG should be stored in an attribute ``random_state_``.
The following example should make this clear::

class GaussianNoise(BaseEstimator, TransformerMixin):
Expand Down Expand Up @@ -541,7 +541,7 @@ integer division is written ``//``.
String handling has been overhauled, though, as have parts of
the Python standard library.
The `six <http://pythonhosted.org/six/>`_ package helps with
cross-compability and is included in scikit-learn as
cross-compatibility and is included in scikit-learn as
``sklearn.externals.six``.


Expand Down
6 changes: 3 additions & 3 deletions doc/developers/performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ code for the scikit-learn project.
implementation optimization.

Times and times, hours of efforts invested in optimizing complicated
implementation details have been rended irrelevant by the late discovery
implementation details have been rendered irrelevant by the late discovery
of simple **algorithmic tricks**, or by using another algorithm altogether
that is better suited to the problem.

Expand Down Expand Up @@ -359,7 +359,7 @@ important in practice on the existing cython codebase in the scikit-learn
project.

TODO: html report, type declarations, bound checks, division by zero checks,
memory alignement, direct blas calls...
memory alignment, direct blas calls...

- http://www.euroscipy.org/file/3696?vid=download
- http://conference.scipy.org/proceedings/SciPy2009/paper_1/
Expand All @@ -373,7 +373,7 @@ Profiling compiled extensions

When working with compiled extensions (written in C/C++ with a wrapper or
directly as Cython extension), the default Python profiler is useless:
we need a dedicated tool to instrospect what's happening inside the
we need a dedicated tool to introspect what's happening inside the
compiled extension it-self.

Using yep and google-perftools
Expand Down
10 changes: 5 additions & 5 deletions doc/modules/clustering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ is given.

Affinity Propagation can be interesting as it chooses the number of
clusters based on the data provided. For this purpose, the two important
parameters are the `preference`, which controls how many examplars are
parameters are the `preference`, which controls how many exemplars are
used, and the `damping` factor.

The main drawback of Affinity Propagation is its complexity. The
Expand Down Expand Up @@ -384,7 +384,7 @@ Different label assignment strategies

Different label assignment strategies can be used, corresponding to the
`assign_labels` parameter of :class:`SpectralClustering`.
The `kmeans` strategie can match finer details of the data, but it can be
The `kmeans` strategy can match finer details of the data, but it can be
more unstable. In particular, unless you control the `random_state`, it
may not be reproducible from run-to-run, as it depends on a random
initialization. On the other hand, the `discretize` strategy is 100%
Expand Down Expand Up @@ -933,14 +933,14 @@ Their harmonic mean called **V-measure** is computed by
The V-measure is actually equivalent to the mutual information (NMI)
discussed above normalized by the sum of the label entropies [B2011]_.

Homogeneity, completensess and V-measure can be computed at once using
Homogeneity, completeness and V-measure can be computed at once using
:func:`homogeneity_completeness_v_measure` as follows::

>>> metrics.homogeneity_completeness_v_measure(labels_true, labels_pred)
... # doctest: +ELLIPSIS
(0.66..., 0.42..., 0.51...)

The following clustering assignment is slighlty better, since it is
The following clustering assignment is slightly better, since it is
homogeneous but not complete::

>>> labels_pred = [0, 0, 0, 1, 2, 2]
Expand All @@ -966,7 +966,7 @@ Advantages

- Intuitive interpretation: clustering with bad V-measure can be
**qualitatively analyzed in terms of homogeneity and completeness**
to better feel what 'kind' of mistakes is done by the assigmenent.
to better feel what 'kind' of mistakes is done by the assignment.

- **No assumption is made on the cluster structure**: can be used
to compare clustering algorithms such as k-means which assumes isotropic
Expand Down
6 changes: 3 additions & 3 deletions doc/modules/cross_validation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ The simplest way to use perform cross-validation in to call the
:func:`cross_val_score` helper function on the estimator and the dataset.

The following example demonstrates how to estimate the accuracy of a
linear kernel support vector machine on the iris dataset by splitting
linear kernel support vector machine on the iris dataset by split
the data and fitting a model and computing the score 5 consecutive times
(with different splits each time)::

Expand Down Expand Up @@ -132,7 +132,7 @@ When the ``cv`` argument is an integer, :func:`cross_val_score` uses the
:class:`KFold` or :class:`StratifiedKFold` strategies by default (depending on
the absence or presence of the target array).

It is also possible to use othe cross validation strategies by passing a cross
It is also possible to use other cross validation strategies by passing a cross
validation iterator instead, for instance::

>>> n_samples = iris.data.shape[0]
Expand Down Expand Up @@ -369,7 +369,7 @@ Random permutations cross-validation a.k.a. Shuffle & Split

The :class:`ShuffleSplit` iterator will generate a user defined number of
independent train / test dataset splits. Samples are first shuffled and
then splitted into a pair of train and test sets.
then split into a pair of train and test sets.

It is possible to control the randomness for reproducibility of the
results by explicitly seeding the ``random_state`` pseudo random number
Expand Down
4 changes: 2 additions & 2 deletions doc/modules/decomposition.rst
Original file line number Diff line number Diff line change
Expand Up @@ -510,11 +510,11 @@ structure of the error covariance :math:`\Psi`:
:class:`ProbabilisticPCA`.

* :math:`\Psi = diag(\psi_1, \psi_2, \dots, \psi_n)`: This model is called Factor
Analysis, a classical statistical model. The matrix W is sometimtes called
Analysis, a classical statistical model. The matrix W is sometimes called
`factor loading matrix`.

Both model essentially estimate a Gaussian with a low-rank covariance matrix.
Because both models are probilistic they can be integrated in more complex
Because both models are probabilistic they can be integrated in more complex
models, e.g. Mixture of Factor Analysers. One gets very different models (e.g.
:class:`FastICA`) if non-Gaussian priors on the latent variables are assumed.

Expand Down
2 changes: 1 addition & 1 deletion doc/modules/dp-derivation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ distributions of :math:`\sigma` (as there are a lot more :math:`\sigma` s
now) and :math:`X`.

The bound for :math:`\sigma_{k,d}` is the same bound for :math:`\sigma_k` and can
be safelly omitted.
be safely omitted.

**The bound for** :math:`X` :

Expand Down
6 changes: 3 additions & 3 deletions doc/modules/ensemble.rst
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ the transformation performs an implicit, non-parametric density estimation.
* :ref:`example_ensemble_plot_random_forest_embedding.py`

* :ref:`example_manifold_plot_lle_digits.py` compares non-linear
dimensionality reduction technics on handwritten digits.
dimensionality reduction techniques on handwritten digits.

.. seealso::

Expand Down Expand Up @@ -430,7 +430,7 @@ with least squares loss and 500 base learners to the Boston house price dataset
The plot on the left shows the train and test error at each iteration.
The train error at each iteration is stored in the
:attr:`~GradientBoostingRegressor.train_score_` attribute
of the gradient boosting model. The test error at each iterations can be optained
of the gradient boosting model. The test error at each iterations can be obtained
via the :meth:`~GradientBoostingRegressor.staged_predict` method which returns a
generator that yields the predictions at each stage. Plots like these can be used
to determine the optimal number of trees (i.e. ``n_estimators``) by early stopping.
Expand Down Expand Up @@ -684,7 +684,7 @@ interactions among the two features. For example, the two-variable PDP in the
above Figure shows the dependence of median house price on joint
values of house age and avg. occupants per household. We can clearly
see an interaction between the two features:
For an avg. occupancy greather than two, the house price is nearly independent
For an avg. occupancy greater than two, the house price is nearly independent
of the house age, whereas for values less than two there is a strong dependence
on age.

Expand Down
14 changes: 7 additions & 7 deletions doc/modules/feature_extraction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ In order to address this, scikit-learn provides utilities for the most
common ways to extract numerical features from text content, namely:

- **tokenizing** strings and giving an integer id for each possible token,
for instance by using whitespaces and punctuation as token separators.
for instance by using white-spaces and punctuation as token separators.

- **counting** the occurrences of tokens in each document.

Expand All @@ -253,7 +253,7 @@ A corpus of documents can thus be represented by a matrix with one row
per document and one column per token (e.g. word) occurring in the corpus.

We call **vectorization** the general process of turning a collection
of text documents into numerical feature vectors. This specific stragegy
of text documents into numerical feature vectors. This specific strategy
(tokenization, counting and normalization) is called the **Bag of Words**
or "Bag of n-grams" representation. Documents are described by word
occurrences while completely ignoring the relative position information
Expand Down Expand Up @@ -348,7 +348,7 @@ ignored in future calls to the transform method::

Note that in the previous corpus, the first and the last documents have
exactly the same words hence are encoded in equal vectors. In particular
we lose the information that the last document is an interogative form. To
we lose the information that the last document is an interrogative form. To
preserve some of the local ordering information we can extract 2-grams
of words in addition to the 1-grams (the word themselvs)::

Expand All @@ -370,7 +370,7 @@ can now resolve ambiguities encoded in local positioning patterns::
[0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1]]...)


In particular the interogative form "Is this" is only present in the
In particular the interrogative form "Is this" is only present in the
last document::

>>> feature_index = bigram_vectorizer.vocabulary_.get(u'is this')
Expand All @@ -384,7 +384,7 @@ Tf–idf term weighting
---------------------

In a large text corpus, some words will be very present (e.g. "the", "a",
"is" in English) hence carrying very little meaningul information about
"is" in English) hence carrying very little meaningful information about
the actual contents of the document. If we were to feed the direct count
data directly to a classifier those very frequent terms would shadow
the frequencies of rarer yet more interesting terms.
Expand Down Expand Up @@ -507,7 +507,7 @@ unigrams (n=1), one might prefer a collection of bigrams (n=2), where
occurrences of pairs of consecutive words are counted.

One might alternatively consider a collection of character n-grams, a
representation resiliant against misspellings and derivations.
representation resilient against misspellings and derivations.

For example, let's say we're dealing with a corpus of two documents:
``['words', 'wprds']``. The second document contains a misspelling
Expand Down Expand Up @@ -548,7 +548,7 @@ span across words::
[u'jumpy', u'mpy f', u'py fo', u'umpy ', u'y fox']

The word boundaries-aware variant ``char_wb`` is especially interesting
for languages that use whitespaces for word separation as it generates
for languages that use white-spaces for word separation as it generates
significantly less noisy features than the raw ``char`` variant in
that case. For such languages it can increase both the predictive
accuracy and convergence speed of classifiers trained using such
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/feature_selection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ alpha parameter, the fewer features selected.

For a good choice of alpha, the :ref:`lasso` can fully recover the
exact set of non-zero variables using only few observations, provided
certain specific conditions are met. In paraticular, the number of
certain specific conditions are met. In particular, the number of
samples should be "sufficiently large", or L1 models will perform at
random, where "sufficiently large" depends on the number of non-zero
coefficients, the logarithm of the number of features, the amount of
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/gaussian_process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ The advantages of Gaussian Processes for Machine Learning are:
correlation models).

- The prediction is probabilistic (Gaussian) so that one can compute
empirical confidence intervals and exceedence probabilities that might be
empirical confidence intervals and exceedance probabilities that might be
used to refit (online fitting, adaptive fitting) the prediction in some
region of interest.

Expand Down
2 changes: 1 addition & 1 deletion doc/modules/grid_search.rst
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ of the training set is left out.

This left out portion can be used to estimate the generalization error
without having to rely on a separate validation set. This estimate
comes "for free" as no addictional data is needed and can be used for
comes "for free" as no additional data is needed and can be used for
model selection.

This is currently implemented in the following classes:
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/hmm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ constructor. Then, you can generate samples from the HMM by calling `sample`.::

* :ref:`example_plot_hmm_sampling.py`

Training HMM parameters and infering the hidden states
Training HMM parameters and inferring the hidden states
------------------------------------------------------

You can train an HMM by calling the `fit` method. The input is "the list" of
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/kernel_approximation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ function does not actually depend on the data given to the ``fit`` function.
Only the dimensionality of the data is used.
Details on the method can be found in [RR2007]_.

For a given value of ``n_components`` :class:`RBFSampler` is often less acurate
For a given value of ``n_components`` :class:`RBFSampler` is often less accurate
as :class:`Nystroem`. :class:`RBFSampler` is cheaper to compute, though, making
use of larger feature spaces more efficient.

Expand Down
6 changes: 3 additions & 3 deletions doc/modules/linear_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ cross-validation: :class:`LassoCV` and :class:`LassoLarsCV`.
explained below.

For high-dimensional datasets with many collinear regressors,
:class:`LassoCV` is most often preferrable. How, :class:`LassoLarsCV` has
:class:`LassoCV` is most often preferable. How, :class:`LassoLarsCV` has
the advantage of exploring more relevant values of `alpha` parameter, and
if the number of samples is very small compared to the number of
observations, it is often faster than :class:`LassoCV`.
Expand Down Expand Up @@ -432,7 +432,7 @@ the residual.

Instead of giving a vector result, the LARS solution consists of a
curve denoting the solution for each value of the L1 norm of the
parameter vector. The full coeffients path is stored in the array
parameter vector. The full coefficients path is stored in the array
``coef_path_``, which has size (n_features, max_features+1). The first
column is always zero.

Expand Down Expand Up @@ -617,7 +617,7 @@ centered on zero and with a precision :math:`\lambda_{i}`:

with :math:`diag \; (A) = \lambda = \{\lambda_{1},...,\lambda_{p}\}`.

In constrast to `Bayesian Ridge Regression`_, each coordinate of :math:`w_{i}`
In contrast to `Bayesian Ridge Regression`_, each coordinate of :math:`w_{i}`
has its own standard deviation :math:`\lambda_i`. The prior over all
:math:`\lambda_i` is chosen to be the same gamma distribution given by
hyperparameters :math:`\lambda_1` and :math:`\lambda_2`.
Expand Down
4 changes: 2 additions & 2 deletions doc/modules/manifold.rst
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ of neighbors is greater than the number of input dimensions, the matrix
defining each local neighborhood is rank-deficient. To address this, standard
LLE applies an arbitrary regularization parameter :math:`r`, which is chosen
relative to the trace of the local weight matrix. Though it can be shown
formally that as :math:`r \to 0`, the solution coverges to the desired
formally that as :math:`r \to 0`, the solution converges to the desired
embedding, there is no guarantee that the optimal solution will be found
for :math:`r > 0`. This problem manifests itself in embeddings which distort
the underlying geometry of the manifold.
Expand Down Expand Up @@ -407,7 +407,7 @@ countries.

There exists two types of MDS algorithm: metric and non metric. In the
scikit-learn, the class :class:`MDS` implements both. In Metric MDS, the input
simiarity matrix arises from a metric (and thus respects the triangular
similarity matrix arises from a metric (and thus respects the triangular
inequality), the distances between output two points are then set to be as
close as possible to the similarity or dissimilarity data. In the non metric
vision, the algorithms will try to preserve the order of the distances, and
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/model_evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ The `F-measure <http://en.wikipedia.org/wiki/F1_score>`_
harmonic mean of the precision and recall. A
:math:`F_\beta` measure reaches its best value at 1 and worst score at 0.
With :math:`\beta = 1`, the :math:`F_\beta` measure leads to the
:math:`F_1` measure, wheres the recall and the precsion are equally important.
:math:`F_1` measure, wheres the recall and the precision are equally important.

Several functions allow you to analyze the precision, recall and F-measures
score:
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/multiclass.rst
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ Example::

.. topic:: References:

.. [1] "Solving multiclass learning problems via error-correcting ouput codes",
.. [1] "Solving multiclass learning problems via error-correcting output codes",
Dietterich T., Bakiri G.,
Journal of Artificial Intelligence Research 2,
1995.
Expand Down
4 changes: 2 additions & 2 deletions doc/modules/outlier_detection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ but regular, observation outside the frontier.

.. topic:: Examples:

* See :ref:`example_svm_plot_oneclass.py` for vizualizing the
* See :ref:`example_svm_plot_oneclass.py` for visualizing the
frontier learned around some data by a
:class:`svm.OneClassSVM` object.

Expand All @@ -80,7 +80,7 @@ Outlier Detection

Outlier detection is similar to novelty detection in the sense that
the goal is to separate a core of regular observations from some
polutting ones, called "outliers". Yet, in the case of outlier
polluting ones, called "outliers". Yet, in the case of outlier
detection, we don't have a clean data set representing the population
of regular observations that can be used to train any tool.

Expand Down
Loading

0 comments on commit 8861833

Please sign in to comment.