Remove duplicate features

In addition to the equality between the formulas for Sum Variance and Cluster Tendency (AIM-Harvard#300), the following features also have identical formulas: GLDM - Gray Level Non-Uniformity Normalized = First Order - Uniformity GLCM - Dissimilarity = GLCM - Difference average GLCM - Homogeneity1 = GLCM - ID (Exactly identical formula, only difference in name) GLCM - Homogeneity2 = GLCM - IDM (Exactly identical formula, only difference in name) Remove these features from the feature classes and update the documentation accordingly. Remove features from example parameter files where applicable. Add mathematical proof for these equalities to the documentation. Simplify formulas in documentation for GLSZM, GLRLM and GLDM (use a predifined variable Nz for the sum of the matrix).
jcd-gh · Oct 26, 2017 · b6a0c4a · b6a0c4a
1 parent 4700502
commit b6a0c4a
Show file tree

Hide file tree

Showing 12 changed files with 426 additions and 399 deletions.
diff --git a/docs/features.rst b/docs/features.rst
@@ -9,11 +9,11 @@ subdivided into the following classes:
 
 * :ref:`radiomics-firstorder-label` (19 features)
 * :ref:`radiomics-shape-label` (16 features)
-* :ref:`radiomics-glcm-label` (26 features)
+* :ref:`radiomics-glcm-label` (23 features)
 * :ref:`radiomics-glszm-label` (16 features)
 * :ref:`radiomics-glrlm-label` (16 features)
 * :ref:`radiomics-ngtdm-label` (5 features)
-* :ref:`radiomics-gldm-label` (15 features)
+* :ref:`radiomics-gldm-label` (14 features)
 
 All feature classes, with the exception of shape can be calculated on either the original image and/or a derived image,
 obtained by applying one of several filters. The shape descriptors are independent of gray value, and are extracted from

diff --git a/docs/removedfeatures.rst b/docs/removedfeatures.rst
@@ -14,8 +14,8 @@ For included features and class definition, see :ref:`radiomics-glcm-label`.
 
 .. _radiomics-excluded-sumvariance-label:
 
-1. Sum Variance
-###############
+Sum Variance
+############
 
 .. math::
     \textit{sum variance} = \displaystyle\sum^{2N_g}_{k=2}{(k-SA)^2p_{x+y}(k)}
@@ -80,9 +80,127 @@ The mathematical proof is as follows:
 (7) Combining (3.) and (6.) yields the following formula:
 
 .. math::
-
     \text{Cluster Tendency} =
     \displaystyle\sum^{2N_g}_{k=2}{\Big[\big(k-SA\big)^2p_{x+y}(k)\Big]} =
     \textit{ sum variance}
 
 Q.E.D
+
+Dissimilarity
+#############
+
+.. math::
+    \textit{dissimilarity} = \displaystyle\sum^{N_g}_{i=1}\displaystyle\sum^{N_g}_{j=1}{p(i,j)|i-j|}
+
+Dissimilarity is a measure of local intensity variation defined as the mean absolute difference between the
+neighbouring pairs. A larger value correlates with a greater disparity in intensity values
+among neighboring voxels.
+
+This feature has been removed, as it is mathematically identical to Difference Average (see
+:py:func:`~radiomics.glcm.RadiomicsGLCM.getDifferenceAverageFeatureValue()`).
+
+The mathematical proof is as follows:
+
+(1) As defined in GLCM, :math:`p_{x-y}(k) = \sum^{N_g}_{i=1}\sum^{N_g}_{j=1}{p(i,j)},\text{ where }|i-j|=k`
+
+(2) Starting with Dissimilarity as defined in GLCM:
+
+.. math::
+    \textit{dissimilarity} = \displaystyle\sum^{N_g}_{i=1}\displaystyle\sum^{N_g}_{j=1}{p(i,j)|i-j|}
+
+    = \displaystyle\sum^{N_g-1}_{k=0}{\Big[
+    \displaystyle\sum^{N_g}_{i=1}\displaystyle\sum^{N_g}_{j=1}{p(i,j)|i-j|} \text{, where }|i-j|=k\Big]}
+
+    = \displaystyle\sum^{N_g-1}_{k=0}{\Big[
+    \displaystyle\sum^{N_g}_{i=1}\displaystyle\sum^{N_g}_{j=1}{p(i,j)k} \text{, where }|i-j|=k\Big]}
+
+(3) Using (1.) and (2.)
+
+.. math::
+    \textit{dissimilarity} = \displaystyle\sum^{N_g-1}_{k=0}{p_{x-y}(k)k} = \textit{difference average}
+
+Q.E.D.
+
+.. _radiomics-excluded-gldm-label:
+
+Excluded GLDM Features
+----------------------
+
+For included features and class definition, see :ref:`radiomics-gldm-label`.
+
+.. _radiomics-excluded-gldm-glnn-label:
+
+Dependence Count percentage
+###########################
+
+.. math::
+    \textit{dependence percentage} = \frac{N_z}{N_p}
+
+Dependence percentage is the ratio between voxels with a dependence zone and the total number of voxels in the image.
+Because PyRadiomics allows for incomplete dependence zones, all voxels have a dependence zone and :math:`N_z = N_p`.
+Therefore, this feature would always compute to 1.
+
+Gray Level Non-Uniformity Normalized
+####################################
+
+.. math::
+    \textit{GLNN} = \frac{\sum^{N_g}_{i=1}\left(\sum^{N_d}_{j=1}{\textbf{P}(i,j)}\right)^2}{N_z^2}
+
+Measures the similarity of gray-level intensity values in the image, where a lower GLNN value
+correlates with a greater similarity in intensity values. This is the normalized version of the GLN formula.
+
+This formula has been removed, because due to the definition of GLDM matrix (allowing incomplete zones), this feature is
+equal to first order Uniformity (see :py:func:`~radiomics.firstorder.RadiomicsFirstOrder.getUniformityFeatureValue()`).
+
+The mathematical proof is as follows:
+
+(1) Starting with Gray Level Non-Uniformity Normalized as defined in GLDM,
+
+.. math::
+    \textit{GLNN} = \frac{\sum^{N_g}_{i=1}\left(\sum^{N_d}_{j=1}{\textbf{P}(i,j)}\right)^2}{N_z^2}
+
+    = \displaystyle\sum^{N_g}_{i=1}{
+        \frac{ \left( \sum^{N_d}_{j=1}{ \textbf{P}(i,j) } \right)^2 }{ N_z^2 }
+    }
+
+    = \displaystyle\sum^{N_g}_{i=1}{ \left(
+        \frac{ \sum^{N_d}_{j=1}{ \textbf{P}(i,j) } }{ N_z }
+    \right)^2}
+
+    = \displaystyle\sum^{N_g}_{i=1}{ \left(
+        \sum^{N_d}_{j=1}{ \frac{ \textbf{P}(i,j) } { N_z } }
+    \right)^2}
+
+(2) As defined in GLDM, :math:`p(i,j) = \frac{\textbf{P}(i,j)}{N_z}`
+
+(3) Using (1.) and (2.)
+
+.. math::
+    \textit{GLNN} = \displaystyle\sum^{N_g}_{i=1}{ \left(
+        \sum^{N_d}_{j=1}{ p(i,j) }
+    \right)^2}
+
+(4) Because in the PyRadiomics definition incomplete dependence zones are allowed, every voxel in the ROI has a
+    dependence zone. Therefore, :math:`N_z = N_p` and :math:`\sum^{N_d}_{j=1}{\textbf{P}(i,j)}` equals the number of voxels
+    with gray level :math:`i` and is equal to :math:`\textbf{P}(i)`, the first order histogram with :math:`N_g` discreet
+    gray levels, as defined in first order.
+
+(5) As defined in first order, :math:`p(i) = \frac{\textbf{P}(i)}{N_p}`
+
+(6) Using (2.), (4.) and (5.)
+
+.. math::
+    \displaystyle\sum^{N_d}_{j=1}{\textbf{P}(i,j)} = \textbf{P}(i)
+
+    \frac{\sum^{N_d}_{j=1}{\textbf{P}(i,j)}}{N_z} = \frac{\textbf{P}(i)}{N_p}
+
+    \displaystyle\sum^{N_d}_{j=1}{\frac{\textbf{P}(i,j)}{N_z}} = \frac{\textbf{P}(i)}{N_p}
+
+    \displaystyle\sum^{N_d}_{j=1}{p(i,j)} = p(i)
+
+(7) Combining (3.) and (6.) yields:
+
+.. math::
+    \textit{GLNN} = \displaystyle\sum^{N_g}_{i=1}{p(i)^2} = Uniformity
+
+Q.E.D.
diff --git a/examples/exampleSettings/Params.yaml b/examples/exampleSettings/Params.yaml
@@ -58,11 +58,8 @@ featureClass:
     - 'DifferenceAverage'
     - 'DifferenceEntropy'
     - 'DifferenceVariance'
-    - 'Dissimilarity'
     - 'JointEnergy'
     - 'JointEntropy'
-    - 'Homogeneity1'
-    - 'Homogeneity2'
     - 'Imc1'
     - 'Imc2'
     - 'Idm'

diff --git a/examples/exampleSettings/exampleCT.yaml b/examples/exampleSettings/exampleCT.yaml
@@ -41,11 +41,8 @@ featureClass:
     - 'DifferenceAverage'
     - 'DifferenceEntropy'
     - 'DifferenceVariance'
-    - 'Dissimilarity'
     - 'JointEnergy'
     - 'JointEntropy'
-    - 'Homogeneity1'
-    - 'Homogeneity2'
     - 'Imc1'
     - 'Imc2'
     - 'Idm'

diff --git a/examples/exampleSettings/exampleMR_3mm.yaml b/examples/exampleSettings/exampleMR_3mm.yaml
@@ -43,11 +43,8 @@ featureClass:
     - 'DifferenceAverage'
     - 'DifferenceEntropy'
     - 'DifferenceVariance'
-    - 'Dissimilarity'
     - 'JointEnergy'
     - 'JointEntropy'
-    - 'Homogeneity1'
-    - 'Homogeneity2'
     - 'Imc1'
     - 'Imc2'
     - 'Idm'

diff --git a/examples/exampleSettings/exampleMR_5mm.yaml b/examples/exampleSettings/exampleMR_5mm.yaml
@@ -42,11 +42,8 @@ featureClass:
     - 'DifferenceAverage'
     - 'DifferenceEntropy'
     - 'DifferenceVariance'
-    - 'Dissimilarity'
     - 'JointEnergy'
     - 'JointEntropy'
-    - 'Homogeneity1'
-    - 'Homogeneity2'
     - 'Imc1'
     - 'Imc2'
     - 'Idm'

diff --git a/examples/exampleSettings/exampleMR_NoResampling.yaml b/examples/exampleSettings/exampleMR_NoResampling.yaml
@@ -46,11 +46,8 @@ featureClass:
     - 'DifferenceAverage'
     - 'DifferenceEntropy'
     - 'DifferenceVariance'
-    - 'Dissimilarity'
     - 'JointEnergy'
     - 'JointEntropy'
-    - 'Homogeneity1'
-    - 'Homogeneity2'
     - 'Imc1'
     - 'Imc2'
     - 'Idm'

diff --git a/radiomics/firstorder.py b/radiomics/firstorder.py
@@ -10,11 +10,11 @@ class RadiomicsFirstOrder(base.RadiomicsFeaturesBase):
 
   Let:
 
-  - :math:`\textbf{X}` be a set of :math:`N` voxels included in the ROI
-  - :math:`\textbf{P}(i)` be the first order histogram with :math:`N_l` discrete intensity levels,
-    where :math:`N_l` is the number of non-zero bins, equally spaced from 0 with a width defined in the ``binWidth``
+  - :math:`\textbf{X}` be a set of :math:`N_p` voxels included in the ROI
+  - :math:`\textbf{P}(i)` be the first order histogram with :math:`N_g` discrete intensity levels,
+    where :math:`N_g` is the number of non-zero bins, equally spaced from 0 with a width defined in the ``binWidth``
     parameter.
-  - :math:`p(i)` be the normalized first order histogram and equal to :math:`\frac{\textbf{P}(i)}{\sum{\textbf{P}(i)}}`
+  - :math:`p(i)` be the normalized first order histogram and equal to :math:`\frac{\textbf{P}(i)}{N_p}`
 
   Following additional settings are possible:
 
@@ -59,7 +59,7 @@ def getEnergyFeatureValue(self):
     **1. Energy**
 
     .. math::
-      \textit{energy} = \displaystyle\sum^{N}_{i=1}{(\textbf{X}(i) + c)^2}
+      \textit{energy} = \displaystyle\sum^{N_p}_{i=1}{(\textbf{X}(i) + c)^2}
 
     Here, :math:`c` is optional value, defined by ``voxelArrayShift``, which shifts the intensities to prevent negative
     values in :math:`\textbf{X}`. This ensures that voxels with the lowest gray values contribute the least to Energy,
@@ -81,7 +81,7 @@ def getTotalEnergyFeatureValue(self):
     **2. Total Energy**
 
     .. math::
-      \textit{total energy} = V_{voxel}\displaystyle\sum^{N}_{i=1}{(\textbf{X}(i) + c)^2}
+      \textit{total energy} = V_{voxel}\displaystyle\sum^{N_p}_{i=1}{(\textbf{X}(i) + c)^2}
 
     Here, :math:`c` is optional value, defined by ``voxelArrayShift``, which shifts the intensities to prevent negative
     values in :math:`\textbf{X}`. This ensures that voxels with the lowest gray values contribute the least to Energy,
@@ -106,7 +106,7 @@ def getEntropyFeatureValue(self):
     **3. Entropy**
 
     .. math::
-      \textit{entropy} = -\displaystyle\sum^{N_l}_{i=1}{p(i)\log_2\big(p(i)+\epsilon\big)}
+      \textit{entropy} = -\displaystyle\sum^{N_g}_{i=1}{p(i)\log_2\big(p(i)+\epsilon\big)}
 
     Here, :math:`\epsilon` is an arbitrarily small positive number (:math:`\approx 2.2\times10^{-16}`).
 
@@ -174,7 +174,7 @@ def getMeanFeatureValue(self):
     **8. Mean**
 
     .. math::
-      \textit{mean} = \frac{1}{N}\displaystyle\sum^{N}_{i=1}{\textbf{X}(i)}
+      \textit{mean} = \frac{1}{N_p}\displaystyle\sum^{N_p}_{i=1}{\textbf{X}(i)}
 
     The average gray level intensity within the ROI.
     """
@@ -220,7 +220,7 @@ def getMeanAbsoluteDeviationFeatureValue(self):
     **12. Mean Absolute Deviation (MAD)**
 
     .. math::
-      \textit{MAD} = \frac{1}{N}\displaystyle\sum^{N}_{i=1}{|\textbf{X}(i)-\bar{X}|}
+      \textit{MAD} = \frac{1}{N_p}\displaystyle\sum^{N_p}_{i=1}{|\textbf{X}(i)-\bar{X}|}
 
     Mean Absolute Deviation is the mean distance of all intensity values from the Mean Value of the image array.
     """
@@ -251,7 +251,7 @@ def getRootMeanSquaredFeatureValue(self):
     **14. Root Mean Squared (RMS)**
 
     .. math::
-      \textit{RMS} = \sqrt{\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i) + c)^2}}
+      \textit{RMS} = \sqrt{\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i) + c)^2}}
 
     Here, :math:`c` is optional value, defined by ``voxelArrayShift``, which shifts the intensities to prevent negative
     values in :math:`\textbf{X}`. This ensures that voxels with the lowest gray values contribute the least to RMS,
@@ -274,7 +274,7 @@ def getStandardDeviationFeatureValue(self):
     **15. Standard Deviation**
 
     .. math::
-      \textit{standard deviation} = \sqrt{\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X})^2}}
+      \textit{standard deviation} = \sqrt{\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X})^2}}
 
     Standard Deviation measures the amount of variation or dispersion from the Mean Value. By definition,
     :math:\textit{standard deviation} = \sqrt{\textit{variance}}
@@ -291,8 +291,8 @@ def getSkewnessFeatureValue(self, axis=0):
 
     .. math::
       \textit{skewness} = \displaystyle\frac{\mu_3}{\sigma^3} =
-      \frac{\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X})^3}}
-      {\left(\sqrt{\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X})^2}}\right)^3}
+      \frac{\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X})^3}}
+      {\left(\sqrt{\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X})^2}}\right)^3}
 
     Where :math:`\mu_3` is the 3\ :sup:`rd` central moment.
 
@@ -322,8 +322,8 @@ def getKurtosisFeatureValue(self, axis=0):
 
     .. math::
       \textit{kurtosis} = \displaystyle\frac{\mu_4}{\sigma^4} =
-      \frac{\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X})^4}}
-      {\left(\frac{1}{N}\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X}})^2\right)^2}
+      \frac{\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X})^4}}
+      {\left(\frac{1}{N_p}\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X}})^2\right)^2}
 
     Where :math:`\mu_4` is the 4\ :sup:`th` central moment.
 
@@ -357,7 +357,7 @@ def getVarianceFeatureValue(self):
     **18. Variance**
 
     .. math::
-      \textit{variance} = \frac{1}{N}\displaystyle\sum^{N}_{i=1}{(\textbf{X}(i)-\bar{X})^2}
+      \textit{variance} = \frac{1}{N_p}\displaystyle\sum^{N_p}_{i=1}{(\textbf{X}(i)-\bar{X})^2}
 
     Variance is the the mean of the squared distances of each intensity value from the Mean value. This is a measure of
     the spread of the distribution about the mean. By definition, :math:`\textit{variance} = \sigma^2`
@@ -370,7 +370,7 @@ def getUniformityFeatureValue(self):
     **19. Uniformity**
 
     .. math::
-      \textit{uniformity} = \displaystyle\sum^{N_l}_{i=1}{p(i)^2}
+      \textit{uniformity} = \displaystyle\sum^{N_g}_{i=1}{p(i)^2}
 
     Uniformity is a measure of the sum of the squares of each intensity value. This is a measure of the heterogeneity of
     the image array, where a greater uniformity implies a greater heterogeneity or a greater range of discrete intensity