Skip to content

Commit

Permalink
Remove duplicate features
Browse files Browse the repository at this point in the history
In addition to the equality between the formulas for Sum Variance and Cluster Tendency (AIM-Harvard#300), the following features also have identical formulas:

GLDM - Gray Level Non-Uniformity Normalized = First Order - Uniformity
GLCM - Dissimilarity = GLCM - Difference average
GLCM - Homogeneity1 = GLCM - ID (Exactly identical formula, only difference in name)
GLCM - Homogeneity2 = GLCM - IDM (Exactly identical formula, only difference in name)

Remove these features from the feature classes and update the documentation accordingly.
Remove features from example parameter files where applicable.
Add mathematical proof for these equalities to the documentation.

Simplify formulas in documentation for GLSZM, GLRLM and GLDM (use a predifined variable Nz for the sum of the matrix).
  • Loading branch information
JoostJM committed Oct 26, 2017
1 parent 4700502 commit b6a0c4a
Show file tree
Hide file tree
Showing 12 changed files with 426 additions and 399 deletions.
4 changes: 2 additions & 2 deletions docs/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ subdivided into the following classes:

* :ref:`radiomics-firstorder-label` (19 features)
* :ref:`radiomics-shape-label` (16 features)
* :ref:`radiomics-glcm-label` (26 features)
* :ref:`radiomics-glcm-label` (23 features)
* :ref:`radiomics-glszm-label` (16 features)
* :ref:`radiomics-glrlm-label` (16 features)
* :ref:`radiomics-ngtdm-label` (5 features)
* :ref:`radiomics-gldm-label` (15 features)
* :ref:`radiomics-gldm-label` (14 features)

All feature classes, with the exception of shape can be calculated on either the original image and/or a derived image,
obtained by applying one of several filters. The shape descriptors are independent of gray value, and are extracted from
Expand Down
124 changes: 121 additions & 3 deletions docs/removedfeatures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ For included features and class definition, see :ref:`radiomics-glcm-label`.

.. _radiomics-excluded-sumvariance-label:

1. Sum Variance
###############
Sum Variance
############

.. math::
\textit{sum variance} = \displaystyle\sum^{2N_g}_{k=2}{(k-SA)^2p_{x+y}(k)}
Expand Down Expand Up @@ -80,9 +80,127 @@ The mathematical proof is as follows:
(7) Combining (3.) and (6.) yields the following formula:

.. math::
\text{Cluster Tendency} =
\displaystyle\sum^{2N_g}_{k=2}{\Big[\big(k-SA\big)^2p_{x+y}(k)\Big]} =
\textit{ sum variance}
Q.E.D

Dissimilarity
#############

.. math::
\textit{dissimilarity} = \displaystyle\sum^{N_g}_{i=1}\displaystyle\sum^{N_g}_{j=1}{p(i,j)|i-j|}
Dissimilarity is a measure of local intensity variation defined as the mean absolute difference between the
neighbouring pairs. A larger value correlates with a greater disparity in intensity values
among neighboring voxels.

This feature has been removed, as it is mathematically identical to Difference Average (see
:py:func:`~radiomics.glcm.RadiomicsGLCM.getDifferenceAverageFeatureValue()`).

The mathematical proof is as follows:

(1) As defined in GLCM, :math:`p_{x-y}(k) = \sum^{N_g}_{i=1}\sum^{N_g}_{j=1}{p(i,j)},\text{ where }|i-j|=k`

(2) Starting with Dissimilarity as defined in GLCM:

.. math::
\textit{dissimilarity} = \displaystyle\sum^{N_g}_{i=1}\displaystyle\sum^{N_g}_{j=1}{p(i,j)|i-j|}
= \displaystyle\sum^{N_g-1}_{k=0}{\Big[
\displaystyle\sum^{N_g}_{i=1}\displaystyle\sum^{N_g}_{j=1}{p(i,j)|i-j|} \text{, where }|i-j|=k\Big]}
= \displaystyle\sum^{N_g-1}_{k=0}{\Big[
\displaystyle\sum^{N_g}_{i=1}\displaystyle\sum^{N_g}_{j=1}{p(i,j)k} \text{, where }|i-j|=k\Big]}
(3) Using (1.) and (2.)

.. math::
\textit{dissimilarity} = \displaystyle\sum^{N_g-1}_{k=0}{p_{x-y}(k)k} = \textit{difference average}
Q.E.D.

.. _radiomics-excluded-gldm-label:

Excluded GLDM Features
----------------------

For included features and class definition, see :ref:`radiomics-gldm-label`.

.. _radiomics-excluded-gldm-glnn-label:

Dependence Count percentage
###########################

.. math::
\textit{dependence percentage} = \frac{N_z}{N_p}
Dependence percentage is the ratio between voxels with a dependence zone and the total number of voxels in the image.
Because PyRadiomics allows for incomplete dependence zones, all voxels have a dependence zone and :math:`N_z = N_p`.
Therefore, this feature would always compute to 1.

Gray Level Non-Uniformity Normalized
####################################

.. math::
\textit{GLNN} = \frac{\sum^{N_g}_{i=1}\left(\sum^{N_d}_{j=1}{\textbf{P}(i,j)}\right)^2}{N_z^2}
Measures the similarity of gray-level intensity values in the image, where a lower GLNN value
correlates with a greater similarity in intensity values. This is the normalized version of the GLN formula.

This formula has been removed, because due to the definition of GLDM matrix (allowing incomplete zones), this feature is
equal to first order Uniformity (see :py:func:`~radiomics.firstorder.RadiomicsFirstOrder.getUniformityFeatureValue()`).

The mathematical proof is as follows:

(1) Starting with Gray Level Non-Uniformity Normalized as defined in GLDM,

.. math::
\textit{GLNN} = \frac{\sum^{N_g}_{i=1}\left(\sum^{N_d}_{j=1}{\textbf{P}(i,j)}\right)^2}{N_z^2}
= \displaystyle\sum^{N_g}_{i=1}{
\frac{ \left( \sum^{N_d}_{j=1}{ \textbf{P}(i,j) } \right)^2 }{ N_z^2 }
}
= \displaystyle\sum^{N_g}_{i=1}{ \left(
\frac{ \sum^{N_d}_{j=1}{ \textbf{P}(i,j) } }{ N_z }
\right)^2}
= \displaystyle\sum^{N_g}_{i=1}{ \left(
\sum^{N_d}_{j=1}{ \frac{ \textbf{P}(i,j) } { N_z } }
\right)^2}
(2) As defined in GLDM, :math:`p(i,j) = \frac{\textbf{P}(i,j)}{N_z}`

(3) Using (1.) and (2.)

.. math::
\textit{GLNN} = \displaystyle\sum^{N_g}_{i=1}{ \left(
\sum^{N_d}_{j=1}{ p(i,j) }
\right)^2}
(4) Because in the PyRadiomics definition incomplete dependence zones are allowed, every voxel in the ROI has a
dependence zone. Therefore, :math:`N_z = N_p` and :math:`\sum^{N_d}_{j=1}{\textbf{P}(i,j)}` equals the number of voxels
with gray level :math:`i` and is equal to :math:`\textbf{P}(i)`, the first order histogram with :math:`N_g` discreet
gray levels, as defined in first order.

(5) As defined in first order, :math:`p(i) = \frac{\textbf{P}(i)}{N_p}`

(6) Using (2.), (4.) and (5.)

.. math::
\displaystyle\sum^{N_d}_{j=1}{\textbf{P}(i,j)} = \textbf{P}(i)
\frac{\sum^{N_d}_{j=1}{\textbf{P}(i,j)}}{N_z} = \frac{\textbf{P}(i)}{N_p}
\displaystyle\sum^{N_d}_{j=1}{\frac{\textbf{P}(i,j)}{N_z}} = \frac{\textbf{P}(i)}{N_p}
\displaystyle\sum^{N_d}_{j=1}{p(i,j)} = p(i)
(7) Combining (3.) and (6.) yields:

.. math::
\textit{GLNN} = \displaystyle\sum^{N_g}_{i=1}{p(i)^2} = Uniformity
Q.E.D.
3 changes: 0 additions & 3 deletions examples/exampleSettings/Params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,8 @@ featureClass:
- 'DifferenceAverage'
- 'DifferenceEntropy'
- 'DifferenceVariance'
- 'Dissimilarity'
- 'JointEnergy'
- 'JointEntropy'
- 'Homogeneity1'
- 'Homogeneity2'
- 'Imc1'
- 'Imc2'
- 'Idm'
Expand Down
3 changes: 0 additions & 3 deletions examples/exampleSettings/exampleCT.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,8 @@ featureClass:
- 'DifferenceAverage'
- 'DifferenceEntropy'
- 'DifferenceVariance'
- 'Dissimilarity'
- 'JointEnergy'
- 'JointEntropy'
- 'Homogeneity1'
- 'Homogeneity2'
- 'Imc1'
- 'Imc2'
- 'Idm'
Expand Down
3 changes: 0 additions & 3 deletions examples/exampleSettings/exampleMR_3mm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,8 @@ featureClass:
- 'DifferenceAverage'
- 'DifferenceEntropy'
- 'DifferenceVariance'
- 'Dissimilarity'
- 'JointEnergy'
- 'JointEntropy'
- 'Homogeneity1'
- 'Homogeneity2'
- 'Imc1'
- 'Imc2'
- 'Idm'
Expand Down
3 changes: 0 additions & 3 deletions examples/exampleSettings/exampleMR_5mm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,8 @@ featureClass:
- 'DifferenceAverage'
- 'DifferenceEntropy'
- 'DifferenceVariance'
- 'Dissimilarity'
- 'JointEnergy'
- 'JointEntropy'
- 'Homogeneity1'
- 'Homogeneity2'
- 'Imc1'
- 'Imc2'
- 'Idm'
Expand Down
3 changes: 0 additions & 3 deletions examples/exampleSettings/exampleMR_NoResampling.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,8 @@ featureClass:
- 'DifferenceAverage'
- 'DifferenceEntropy'
- 'DifferenceVariance'
- 'Dissimilarity'
- 'JointEnergy'
- 'JointEntropy'
- 'Homogeneity1'
- 'Homogeneity2'
- 'Imc1'
- 'Imc2'
- 'Idm'
Expand Down
34 changes: 17 additions & 17 deletions radiomics/firstorder.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ class RadiomicsFirstOrder(base.RadiomicsFeaturesBase):
Let:
- :math:`\textbf{X}` be a set of :math:`N` voxels included in the ROI
- :math:`\textbf{P}(i)` be the first order histogram with :math:`N_l` discrete intensity levels,
where :math:`N_l` is the number of non-zero bins, equally spaced from 0 with a width defined in the ``binWidth``
- :math:`\textbf{X}` be a set of :math:`N_p` voxels included in the ROI
- :math:`\textbf{P}(i)` be the first order histogram with :math:`N_g` discrete intensity levels,
where :math:`N_g` is the number of non-zero bins, equally spaced from 0 with a width defined in the ``binWidth``
parameter.
- :math:`p(i)` be the normalized first order histogram and equal to :math:`\frac{\textbf{P}(i)}{\sum{\textbf{P}(i)}}`
- :math:`p(i)` be the normalized first order histogram and equal to :math:`\frac{\textbf{P}(i)}{N_p}`
Following additional settings are possible:
Expand Down Expand Up @@ -59,7 +59,7 @@ def getEnergyFeatureValue(self):
**1. Energy**
.. math::
\textit{energy} = \displaystyle\sum^{N}_{i=1}{(\textbf{X}(i) + c)^2}
\textit{energy} = \displaystyle\sum^{N_p}_{i=1}{(\textbf{X}(i) + c)^2}
Here, :math:`c` is optional value, defined by ``voxelArrayShift``, which shifts the intensities to prevent negative
values in :math:`\textbf{X}`. This ensures that voxels with the lowest gray values contribute the least to Energy,
Expand All @@ -81,7 +81,7 @@ def getTotalEnergyFeatureValue(self):
**2. Total Energy**
.. math::
\textit{total energy} = V_{voxel}\displaystyle\sum^{N}_{i=1}{(\textbf{X}(i) + c)^2}
\textit{total energy} = V_{voxel}\displaystyle\sum^{N_p}_{i=1}{(\textbf{X}(i) + c)^2}
Here, :math:`c` is optional value, defined by ``voxelArrayShift``, which shifts the intensities to prevent negative
values in :math:`\textbf{X}`. This ensures that voxels with the lowest gray values contribute the least to Energy,
Expand All @@ -106,7 +106,7 @@ def getEntropyFeatureValue(self):
**3. Entropy**
.. math::
\textit{entropy} = -\displaystyle\sum^{N_l}_{i=1}{p(i)\log_2\big(p(i)+\epsilon\big)}
\textit{entropy} = -\displaystyle\sum^{N_g}_{i=1}{p(i)\log_2\big(p(i)+\epsilon\big)}
Here, :math:`\epsilon` is an arbitrarily small positive number (:math:`\approx 2.2\times10^{-16}`).
Expand Down Expand Up @@ -174,7 +174,7 @@ def getMeanFeatureValue(self):
**8. Mean**
.. math::
\textit{mean} = \frac{1}{N}\displaystyle\sum^{N}_{i=1}{\textbf{X}(i)}
\textit{mean} = \frac{1}{N_p}\displaystyle\sum^{N_p}_{i=1}{\textbf{X}(i)}
The average gray level intensity within the ROI.
"""
Expand Down Expand Up @@ -220,7 +220,7 @@ def getMeanAbsoluteDeviationFeatureValue(self):
**12. Mean Absolute Deviation (MAD)**
.. math::
\textit{MAD} = \frac{1}{N}\displaystyle\sum^{N}_{i=1}{|\textbf{X}(i)-\bar{X}|}
\textit{MAD} = \frac{1}{N_p}\displaystyle\sum^{N_p}_{i=1}{|\textbf{X}(i)-\bar{X}|}
Mean Absolute Deviation is the mean distance of all intensity values from the Mean Value of the image array.
"""
Expand Down Expand Up @@ -251,7 +251,7 @@ def getRootMeanSquaredFeatureValue(self):
**14. Root Mean Squared (RMS)**
.. math::
\textit{RMS} = \sqrt{\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i) + c)^2}}
\textit{RMS} = \sqrt{\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i) + c)^2}}
Here, :math:`c` is optional value, defined by ``voxelArrayShift``, which shifts the intensities to prevent negative
values in :math:`\textbf{X}`. This ensures that voxels with the lowest gray values contribute the least to RMS,
Expand All @@ -274,7 +274,7 @@ def getStandardDeviationFeatureValue(self):
**15. Standard Deviation**
.. math::
\textit{standard deviation} = \sqrt{\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X})^2}}
\textit{standard deviation} = \sqrt{\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X})^2}}
Standard Deviation measures the amount of variation or dispersion from the Mean Value. By definition,
:math:\textit{standard deviation} = \sqrt{\textit{variance}}
Expand All @@ -291,8 +291,8 @@ def getSkewnessFeatureValue(self, axis=0):
.. math::
\textit{skewness} = \displaystyle\frac{\mu_3}{\sigma^3} =
\frac{\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X})^3}}
{\left(\sqrt{\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X})^2}}\right)^3}
\frac{\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X})^3}}
{\left(\sqrt{\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X})^2}}\right)^3}
Where :math:`\mu_3` is the 3\ :sup:`rd` central moment.
Expand Down Expand Up @@ -322,8 +322,8 @@ def getKurtosisFeatureValue(self, axis=0):
.. math::
\textit{kurtosis} = \displaystyle\frac{\mu_4}{\sigma^4} =
\frac{\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X})^4}}
{\left(\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X}})^2\right)^2}
\frac{\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X})^4}}
{\left(\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X}})^2\right)^2}
Where :math:`\mu_4` is the 4\ :sup:`th` central moment.
Expand Down Expand Up @@ -357,7 +357,7 @@ def getVarianceFeatureValue(self):
**18. Variance**
.. math::
\textit{variance} = \frac{1}{N}\displaystyle\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X})^2}
\textit{variance} = \frac{1}{N_p}\displaystyle\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X})^2}
Variance is the the mean of the squared distances of each intensity value from the Mean value. This is a measure of
the spread of the distribution about the mean. By definition, :math:`\textit{variance} = \sigma^2`
Expand All @@ -370,7 +370,7 @@ def getUniformityFeatureValue(self):
**19. Uniformity**
.. math::
\textit{uniformity} = \displaystyle\sum^{N_l}_{i=1}{p(i)^2}
\textit{uniformity} = \displaystyle\sum^{N_g}_{i=1}{p(i)^2}
Uniformity is a measure of the sum of the squares of each intensity value. This is a measure of the heterogeneity of
the image array, where a greater uniformity implies a greater heterogeneity or a greater range of discrete intensity
Expand Down
Loading

0 comments on commit b6a0c4a

Please sign in to comment.