Skip to content

Commit

Permalink
updated docstrings from the scipy documentation editor (thanks to eve…
Browse files Browse the repository at this point in the history
…ryone who contributed!)
  • Loading branch information
jarrodmillman committed Oct 1, 2009
1 parent 8364f28 commit f5b4d40
Show file tree
Hide file tree
Showing 29 changed files with 2,532 additions and 1,116 deletions.
73 changes: 70 additions & 3 deletions scipy/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,76 @@
"""\
SciPy --- A scientific computing package for Python
===================================================
"""
SciPy: A scientific computing package for Python
================================================
Documentation is available in the docstrings and
online at http://docs.scipy.org.
Contents
--------
SciPy imports all the functions from the NumPy namespace, and in
addition provides:
Subpackages
-----------
::
odr --- Orthogonal Distance Regression [*]
misc --- Various utilities that don't have
another home.
cluster --- Vector Quantization / Kmeans [*]
fftpack --- Discrete Fourier Transform algorithms
[*]
io --- Data input and output [*]
sparse.linalg.eigen.lobpcg --- Locally Optimal Block Preconditioned
Conjugate Gradient Method (LOBPCG) [*]
special --- Airy Functions [*]
lib.blas --- Wrappers to BLAS library [*]
sparse.linalg.eigen --- Sparse Eigenvalue Solvers [*]
stats --- Statistical Functions [*]
lib --- Python wrappers to external libraries
[*]
lib.lapack --- Wrappers to LAPACK library [*]
maxentropy --- Routines for fitting maximum entropy
models [*]
integrate --- Integration routines [*]
ndimage --- n-dimensional image package [*]
linalg --- Linear algebra routines [*]
spatial --- Spatial data structures and algorithms
[*]
interpolate --- Interpolation Tools [*]
sparse.linalg --- Sparse Linear Algebra [*]
sparse.linalg.dsolve.umfpack --- :Interface to the UMFPACK library: [*]
sparse.linalg.dsolve --- Linear Solvers [*]
optimize --- Optimization Tools [*]
sparse.linalg.eigen.arpack --- Eigenvalue solver using iterative
methods. [*]
signal --- Signal Processing Tools [*]
sparse --- Sparse Matrices [*]
[*] - using a package requires explicit import
Global symbols from subpackages
-------------------------------
::
misc --> info, factorial, factorial2, factorialk,
comb, who, lena, central_diff_weights,
derivative, pade, source
fftpack --> fft, fftn, fft2, ifft, ifft2, ifftn,
fftshift, ifftshift, fftfreq
stats --> find_repeats
linalg.dsolve.umfpack --> UmfpackContext
Utility tools
-------------
::
test --- Run scipy unittests
show_config --- Show scipy build configuration
show_numpy_config --- Show numpy build configuration
__version__ --- Scipy version string
__numpy_version__ --- Numpy version string
"""

__all__ = ['pkgload','test']
Expand Down
288 changes: 145 additions & 143 deletions scipy/cluster/hierarchy.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,151 +440,152 @@ def ward(y):


def linkage(y, method='single', metric='euclidean'):
r"""
"""
Performs hierarchical/agglomerative clustering on the
condensed distance matrix y. y must be a :math:`{n \choose 2}` sized
vector where n is the number of original observations paired
in the distance matrix. The behavior of this function is very
similar to the MATLAB(TM) linkage function.
A 4 by :math:`(n-1)` matrix ``Z`` is returned. At the
:math:`i`-th iteration, clusters with indices ``Z[i, 0]`` and
``Z[i, 1]`` are combined to form cluster :math:`n + i`. A
cluster with an index less than :math:`n` corresponds to one of
the :math:`n` original observations. The distance between
clusters ``Z[i, 0]`` and ``Z[i, 1]`` is given by ``Z[i, 2]``. The
fourth value ``Z[i, 3]`` represents the number of original
observations in the newly formed cluster.
The following linkage methods are used to compute the distance
:math:`d(s, t)` between two clusters :math:`s` and
:math:`t`. The algorithm begins with a forest of clusters that
have yet to be used in the hierarchy being formed. When two
clusters :math:`s` and :math:`t` from this forest are combined
into a single cluster :math:`u`, :math:`s` and :math:`t` are
removed from the forest, and :math:`u` is added to the
forest. When only one cluster remains in the forest, the algorithm
stops, and this cluster becomes the root.
A distance matrix is maintained at each iteration. The ``d[i,j]``
entry corresponds to the distance between cluster :math:`i` and
:math:`j` in the original forest.
At each iteration, the algorithm must update the distance matrix
to reflect the distance of the newly formed cluster u with the
remaining clusters in the forest.
Suppose there are :math:`|u|` original observations
:math:`u[0], \ldots, u[|u|-1]` in cluster :math:`u` and
:math:`|v|` original objects :math:`v[0], \ldots, v[|v|-1]` in
cluster :math:`v`. Recall :math:`s` and :math:`t` are
combined to form cluster :math:`u`. Let :math:`v` be any
remaining cluster in the forest that is not :math:`u`.
The following are methods for calculating the distance between the
newly formed cluster :math:`u` and each :math:`v`.
* method='single' assigns
.. math::
d(u,v) = \min(dist(u[i],v[j]))
for all points :math:`i` in cluster :math:`u` and
:math:`j` in cluster :math:`v`. This is also known as the
Nearest Point Algorithm.
* method='complete' assigns
.. math::
d(u, v) = \max(dist(u[i],v[j]))
for all points :math:`i` in cluster u and :math:`j` in
cluster :math:`v`. This is also known by the Farthest Point
Algorithm or Voor Hees Algorithm.
* method='average' assigns
.. math::
d(u,v) = \sum_{ij} \frac{d(u[i], v[j])}
{(|u|*|v|)}
for all points :math:`i` and :math:`j` where :math:`|u|`
and :math:`|v|` are the cardinalities of clusters :math:`u`
and :math:`v`, respectively. This is also called the UPGMA
algorithm. This is called UPGMA.
* method='weighted' assigns
.. math::
d(u,v) = (dist(s,v) + dist(t,v))/2
where cluster u was formed with cluster s and t and v
is a remaining cluster in the forest. (also called WPGMA)
* method='centroid' assigns
.. math::
dist(s,t) = ||c_s-c_t||_2
where :math:`c_s` and :math:`c_t` are the centroids of
clusters :math:`s` and :math:`t`, respectively. When two
clusters :math:`s` and :math:`t` are combined into a new
cluster :math:`u`, the new centroid is computed over all the
original objects in clusters :math:`s` and :math:`t`. The
distance then becomes the Euclidean distance between the
centroid of :math:`u` and the centroid of a remaining cluster
:math:`v` in the forest. This is also known as the UPGMC
algorithm.
* method='median' assigns math:`d(s,t)` like the ``centroid``
method. When two clusters :math:`s` and :math:`t` are combined
into a new cluster :math:`u`, the average of centroids s and t
give the new centroid :math:`u`. This is also known as the
WPGMC algorithm.
* method='ward' uses the Ward variance minimization algorithm.
The new entry :math:`d(u,v)` is computed as follows,
.. math::
condensed distance matrix y. y must be a :math:`{n \\choose 2}` sized
vector where n is the number of original observations paired
in the distance matrix. The behavior of this function is very
similar to the MATLAB(TM) linkage function.
A 4 by :math:`(n-1)` matrix ``Z`` is returned. At the
:math:`i`-th iteration, clusters with indices ``Z[i, 0]`` and
``Z[i, 1]`` are combined to form cluster :math:`n + i`. A
cluster with an index less than :math:`n` corresponds to one of
the :math:`n` original observations. The distance between
clusters ``Z[i, 0]`` and ``Z[i, 1]`` is given by ``Z[i, 2]``. The
fourth value ``Z[i, 3]`` represents the number of original
observations in the newly formed cluster.
The following linkage methods are used to compute the distance
:math:`d(s, t)` between two clusters :math:`s` and
:math:`t`. The algorithm begins with a forest of clusters that
have yet to be used in the hierarchy being formed. When two
clusters :math:`s` and :math:`t` from this forest are combined
into a single cluster :math:`u`, :math:`s` and :math:`t` are
removed from the forest, and :math:`u` is added to the
forest. When only one cluster remains in the forest, the algorithm
stops, and this cluster becomes the root.
A distance matrix is maintained at each iteration. The ``d[i,j]``
entry corresponds to the distance between cluster :math:`i` and
:math:`j` in the original forest.
At each iteration, the algorithm must update the distance matrix
to reflect the distance of the newly formed cluster u with the
remaining clusters in the forest.
Suppose there are :math:`|u|` original observations
:math:`u[0], \\ldots, u[|u|-1]` in cluster :math:`u` and
:math:`|v|` original objects :math:`v[0], \\ldots, v[|v|-1]` in
cluster :math:`v`. Recall :math:`s` and :math:`t` are
combined to form cluster :math:`u`. Let :math:`v` be any
remaining cluster in the forest that is not :math:`u`.
The following are methods for calculating the distance between the
newly formed cluster :math:`u` and each :math:`v`.
* method='single' assigns
.. math::
d(u,v) = \\min(dist(u[i],v[j]))
for all points :math:`i` in cluster :math:`u` and
:math:`j` in cluster :math:`v`. This is also known as the
Nearest Point Algorithm.
* method='complete' assigns
.. math::
d(u, v) = \\max(dist(u[i],v[j]))
for all points :math:`i` in cluster u and :math:`j` in
cluster :math:`v`. This is also known by the Farthest Point
Algorithm or Voor Hees Algorithm.
* method='average' assigns
.. math::
d(u,v) = \\sum_{ij} \\frac{d(u[i], v[j])}
{(|u|*|v|)}
for all points :math:`i` and :math:`j` where :math:`|u|`
and :math:`|v|` are the cardinalities of clusters :math:`u`
and :math:`v`, respectively. This is also called the UPGMA
algorithm. This is called UPGMA.
* method='weighted' assigns
.. math::
d(u,v) = (dist(s,v) + dist(t,v))/2
where cluster u was formed with cluster s and t and v
is a remaining cluster in the forest. (also called WPGMA)
* method='centroid' assigns
.. math::
dist(s,t) = ||c_s-c_t||_2
where :math:`c_s` and :math:`c_t` are the centroids of
clusters :math:`s` and :math:`t`, respectively. When two
clusters :math:`s` and :math:`t` are combined into a new
cluster :math:`u`, the new centroid is computed over all the
original objects in clusters :math:`s` and :math:`t`. The
distance then becomes the Euclidean distance between the
centroid of :math:`u` and the centroid of a remaining cluster
:math:`v` in the forest. This is also known as the UPGMC
algorithm.
* method='median' assigns math:`d(s,t)` like the ``centroid``
method. When two clusters :math:`s` and :math:`t` are combined
into a new cluster :math:`u`, the average of centroids s and t
give the new centroid :math:`u`. This is also known as the
WPGMC algorithm.
* method='ward' uses the Ward variance minimization algorithm.
The new entry :math:`d(u,v)` is computed as follows,
.. math::
d(u,v) = \\sqrt{\\frac{|v|+|s|}
{T}d(v,s)^2
+ \\frac{|v|+|t|}
{T}d(v,t)^2
+ \\frac{|v|}
{T}d(s,t)^2}
where :math:`u` is the newly joined cluster consisting of
clusters :math:`s` and :math:`t`, :math:`v` is an unused
cluster in the forest, :math:`T=|v|+|s|+|t|`, and
:math:`|*|` is the cardinality of its argument. This is also
known as the incremental algorithm.
Warning: When the minimum distance pair in the forest is chosen, there may
be two or more pairs with the same minimum distance. This
implementation may chose a different minimum than the MATLAB(TM)
version.
d(u,v) = \sqrt{\frac{|v|+|s|}
{T}d(v,s)^2
+ \frac{|v|+|t|}
{T}d(v,t)^2
+ \frac{|v|}
{T}d(s,t)^2}
where :math:`u` is the newly joined cluster consisting of
clusters :math:`s` and :math:`t`, :math:`v` is an unused
cluster in the forest, :math:`T=|v|+|s|+|t|`, and
:math:`|*|` is the cardinality of its argument. This is also
known as the incremental algorithm.
:Parameters:
- y : ndarray
A condensed or redundant distance matrix. A condensed
distance matrix is a flat array containing the upper
triangular of the distance matrix. This is the form that
``pdist`` returns. Alternatively, a collection of
:math:`m` observation vectors in n dimensions may be passed as
an :math:`m` by :math:`n` array.
- method : string
The linkage algorithm to use. See the ``Linkage Methods``
section below for full descriptions.
- metric : string
The distance metric to use. See the ``distance.pdist``
function for a list of valid distance metrics.
Warning: When the minimum distance pair in the forest is chosen, there may
be two or more pairs with the same minimum distance. This
implementation may chose a different minimum than the MATLAB(TM)
version.
:Returns:
:Parameters:
- Q : ndarray
A condensed or redundant distance matrix. A condensed
distance matrix is a flat array containing the upper
triangular of the distance matrix. This is the form that
``pdist`` returns. Alternatively, a collection of
:math:`m` observation vectors in n dimensions may be passed as
a :math:`m` by :math:`n` array.
- method : string
The linkage algorithm to use. See the ``Linkage Methods``
section below for full descriptions.
- metric : string
The distance metric to use. See the ``distance.pdist``
function for a list of valid distance metrics.
:Returns:
- Z : ndarray
The hierarchical clustering encoded as a linkage matrix.
- Z : ndarray
The hierarchical clustering encoded as a linkage matrix.
"""
"""
if not isinstance(method, str):
raise TypeError("Argument 'method' must be a string.")

Expand Down Expand Up @@ -1457,9 +1458,9 @@ def fclusterdata(X, t, criterion='inconsistent', \
:Arguments:
- Z : ndarray
The hierarchical clustering encoded with the matrix returned
by the ``linkage`` function.
- X : ndarray
``n`` by ``m`` data matrix with ``n`` observations in ``m``
dimensions.
- t : double
The threshold to apply when forming flat clusters.
Expand Down Expand Up @@ -1502,6 +1503,7 @@ def fclusterdata(X, t, criterion='inconsistent', \
-----
This function is similar to MATLAB(TM) clusterdata function.
"""
X = np.asarray(X, order='c', dtype=np.double)

Expand Down
Loading

0 comments on commit f5b4d40

Please sign in to comment.