Merge pull request #7 from TimOliverMaier/dev

Refactor Synthetics
TimOliverMaier · Dec 22, 2022 · ea4e3e4 · ea4e3e4
2 parents d5b599f + 09ffb07
commit ea4e3e4
Show file tree

Hide file tree

Showing 10 changed files with 532 additions and 572 deletions.
diff --git a/.gitignore b/.gitignore
@@ -30,4 +30,4 @@ docs_src/
 .pytest_cache/
 
 # output of dvc plots
-dvc_plots/
+dvc_plots/
diff --git a/examples/synthetics.ipynb b/examples/synthetics.ipynb
@@ -31,7 +31,7 @@
    "source": [
     "## Isotopic Distributions in Mass Spectrometry\n",
     "\n",
-    "Peptide isotopic patterns can be modeled via an gaussian mixture distribution. The positions of the components are dependent on the charge and the mass of the peptide while the weights can be calculated by models such as the averagine-like model presented by Breen *et al.*.\n",
+    "Peptide isotopic patterns can be modeled via an gaussian mixture distribution. The positions of the components are depending on the charge and the mass of the peptide while the weights can be calculated by models such as the averagine-like model presented by Breen *et al.* [1].\n",
     "\n",
     "In PystoMS a `scipy.stats.rv_continuous` subclass is implemented to allow sampling from isotopic distributions. "
    ]
@@ -72,7 +72,8 @@
    "metadata": {},
    "source": [
     "To calculate the pdf along an axis, the parameters of the isotopic distribution must be passed to the pdf method. \n",
-    "The parameter `mass` is only considered in calculation of the component weights, to shift the distribution from 0 to the monoisotopic peak use `loc`. Both `mass` and `loc` should store the mass and the mass to charge ratio of the monoisotopic peak, respectively."
+    "The parameter `mass` is only considered in calculation of the component weights, to shift the distribution from 0 to the monoisotopic peak use `loc`. The parameters `mass` and `loc` should store the mass and the mass to charge ratio of the monoisotopic peak, respectively. However, the usage of the scipy `scale` parameter is not supported. For the shape of the distribution use `sigma` and `charge`. This implementation allows to have an individual `sigma`\n",
+    "for each gauss bell."
    ]
   },
   {
@@ -101,8 +102,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "y = iso.rvs( loc=301.2, mass = 301.2,charge = 2,sigma=0.05,num_peaks=6,size=5000,random_state=rng)\n",
-    "plt.hist(y,bins=100)\n",
+    "samples = iso.rvs( loc=301.2, mass = 301.2,charge = 2,sigma=0.05,num_peaks=6,size=5000,random_state=rng)\n",
+    "plt.hist(samples,bins=100)\n",
     "plt.show()"
    ]
   },
@@ -111,8 +112,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Both `.pdf()` and `.rvs()` internally call the overwritten methods `._pdf()` and `._rvs()`, respectively. The `._rvs()` method first samples a component a sample is generated by according to the component weights. Then the sample's deviation from its component's mean is drawn from a normal distribution N(0,σ).\n",
-    "To simulate (m/z,intensities) form the distribution use the `.rvs_to_intensities()` method. The method draws `size` samples from\n",
+    "Both `.pdf()` and `.rvs()` were overwritten and internally call the overwritten methods `._pdf()` and `._rvs()`, respectively. The `._rvs()` method first samples a component and then the deviation of the sample from the component`s mean.\n",
+    "To simulate (m/z,intensities) from the distribution use the `.rvs_to_intensities()` method. The method draws `size` samples from\n",
     "`.rvs()` and bins theses samples based on `bin_width`. Each sample is considered a single molecule that contributes `signal_per_molecule` arbitrary units\n",
     "to intensity of it's bin."
    ]
@@ -128,11 +129,53 @@
     "plt.scatter(bins_pos,intensities)\n",
     "plt.show()"
    ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It is also possible to pass a `sigma` vector of different sigmas (with size corresponding to the number of modeled peaks `num_peaks`)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y2 = iso.pdf(x,loc=301.2,mass=301.2,charge=2,sigma=np.array([0.1,0.2,0.01,0.01]),num_peaks=4)\n",
+    "plt.plot(x,y2)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "samples2 = iso.rvs(loc=301.2,mass=301.2,charge=2,sigma=np.array([0.1,0.2,0.01,0.01]),num_peaks=4,size=50000)\n",
+    "plt.hist(samples2,bins=100)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "bins_pos2,intensities2 = iso.rvs_to_intensities( loc=301.2, mass = 301.2,charge = 2,sigma=np.array([0.1,0.2,0.01,0.01]),num_peaks=4,size=50000,random_state=rng,bin_width=0.001)\n",
+    "plt.plot(bins_pos2,intensities2,alpha=0.5,ls=\":\")\n",
+    "plt.scatter(bins_pos2,intensities2)\n",
+    "plt.show()"
+   ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "pymc_dev",
+   "display_name": "master_dev_310",
    "language": "python",
    "name": "python3"
   },
@@ -146,12 +189,12 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.10.8"
   },
   "orig_nbformat": 4,
   "vscode": {
    "interpreter": {
-    "hash": "e72a8d8ce8364635d68c11ed264ecd033ec926576ac60b4957a81f7389674ed5"
+    "hash": "a0f3cdbb03b3455b057494528c47f11ae9506033764e57bdc42e3f76c3be3b29"
    }
   }
  },

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,4 +1,4 @@
 [build-system]
 # Minimum requirements for the build system to execute.
 requires = ["setuptools", "wheel"]  # PEP 508 specifications.
-# from https://peps.python.org/pep-0518/
+# from https://peps.python.org/pep-0518/
diff --git a/src/pystoms/models_3d/abstract_3d.py b/src/pystoms/models_3d/abstract_3d.py
@@ -18,6 +18,7 @@
 import os
 from numpy.typing import ArrayLike
 from time import time
+
 # typing
 NDArrayFloat = npt.NDArray[np.float64]
 NDArrayInt = npt.NDArray[np.int64]
@@ -452,7 +453,15 @@ def arviz_plots(
             else:
                 plt.show()
 
-            az.plot_trace(idata_sliced, var_names, legend=True, chain_prop={"linestyle": ("solid", "dotted", "dashed", "dashdot"),"color":("black","blue","green","red")})
+            az.plot_trace(
+                idata_sliced,
+                var_names,
+                legend=True,
+                chain_prop={
+                    "linestyle": ("solid", "dotted", "dashed", "dashdot"),
+                    "color": ("black", "blue", "green", "red"),
+                },
+            )
             if save_fig:
                 plt.tight_layout()
                 plt.savefig(feature_path + "trace.png")
@@ -484,41 +493,43 @@ def arviz_plots(
 
             # posterior predictive lm
             idata_feature_sliced = self.idata.isel(feature=idx)
-            az.plot_lm(y=idata_feature_sliced.observed_data.obs,
-                       y_hat=idata_feature_sliced.posterior_predictive.obs,
-                       num_samples=500)
+            az.plot_lm(
+                y=idata_feature_sliced.observed_data.obs,
+                y_hat=idata_feature_sliced.posterior_predictive.obs,
+                num_samples=500,
+            )
             if save_fig:
                 plt.savefig(feature_path + "posterior_predictive_lm.png")
                 plt.close()
             else:
                 plt.show()
 
             # prior predictive lm
-            fig,ax = plt.subplots(1,1)
-            az.plot_lm(y=idata_feature_sliced.observed_data.obs,
-                       y_hat=idata_feature_sliced.prior_predictive.obs,
-                       num_samples=500,
-                       legend=False,
-                       axes=ax)
-            h,l = ax.get_legend_handles_labels()
-            for idx,lab in enumerate(l):
+            fig, ax = plt.subplots(1, 1)
+            az.plot_lm(
+                y=idata_feature_sliced.observed_data.obs,
+                y_hat=idata_feature_sliced.prior_predictive.obs,
+                num_samples=500,
+                legend=False,
+                axes=ax,
+            )
+            h, l = ax.get_legend_handles_labels()
+            for idx, lab in enumerate(l):
                 if lab == "Posterior predictive samples":
                     l[idx] = "Prior predictive samples"
-            ax.legend(h,l)
+            ax.legend(h, l)
             if save_fig:
                 fig.savefig(feature_path + "prior_predictive_lm.png")
                 plt.close()
             else:
                 plt.show()
 
-            az.plot_parallel(idata_sliced,norm_method="minmax")
+            az.plot_parallel(idata_sliced, norm_method="minmax")
             if save_fig:
                 plt.savefig(feature_path + "plot_parallel.png")
                 plt.close()
             else:
                 plt.show()
-
-
 
     def evaluation(
         self,
@@ -531,7 +542,7 @@ def evaluation(
         progressbar: bool = True,
         pred_name_list: Optional[List[str]] = None,
         **plot_kwargs,
-    ) -> Tuple[az.InferenceData,float]:
+    ) -> Tuple[az.InferenceData, float]:
         """Evaluate precursor feature model.
 
         This function is wrapping several steps
@@ -571,7 +582,7 @@ def evaluation(
         end_sampling = time()
 
         if pred_name_list is None:
-            pred_name_list = ["obs","mu"]
+            pred_name_list = ["obs", "mu"]
 
         if plots is None:
             # make 'in' operator available for plots
@@ -588,7 +599,9 @@ def evaluation(
 
         if prior_pred_out:
             # prior predictive analysis out-of-sample
-            self._sample_predictive(is_prior=True, is_grid_predictive=True, var_names=pred_name_list)
+            self._sample_predictive(
+                is_prior=True, is_grid_predictive=True, var_names=pred_name_list
+            )
             if "prior_pred_out" in plots:
                 for pred_name in pred_name_list:
                     plot_kwargs["pred_name"] = pred_name
@@ -606,10 +619,14 @@ def evaluation(
 
         if posterior_pred_out:
             # posterior predictive analysis out-of-sample
-            self._sample_predictive(is_grid_predictive=True, progressbar=progressbar, var_names=pred_name_list)
+            self._sample_predictive(
+                is_grid_predictive=True,
+                progressbar=progressbar,
+                var_names=pred_name_list,
+            )
             if "posterior_pred_out" in plots:
                 for pred_name in pred_name_list:
                     plot_kwargs["pred_name"] = pred_name
                     self.visualize_predictions_scatter(in_sample=False, **plot_kwargs)
 
-        return (self.idata.copy(),end_sampling-start_sampling)
+        return (self.idata.copy(), end_sampling - start_sampling)
diff --git a/src/pystoms/models_3d/model_3d_m1.py b/src/pystoms/models_3d/model_3d_m1.py
@@ -33,7 +33,7 @@ class ModelM1(AbstractModel):
     Args:
         features (AlignedFeatureData): Feature's data wrapped in
             pystoms.AlignedFeatureData container.
-        model_parameters (Dictionary, optional): parameters for 
+        model_parameters (Dictionary, optional): parameters for
             priors and hyperpriors. Defaults to None.
         likelihood (str, optional): Likelihood distribution. Currently
             supported: 'Normal', 'StudentT'. Defaults to 'Normal'.
@@ -62,7 +62,9 @@ def __init__(
         if model_parameters is None:
             model_parameters = {}
         self.model_parameters = model_parameters.copy()
-        super().__init__(feature_ids, batch_size, random_number_generator, name, coords=coords)
+        super().__init__(
+            feature_ids, batch_size, random_number_generator, name, coords=coords
+        )
         self.setup_mutable_data(features)
         # TODO is there a nicer way to share this between init and setup_mutable_data()
         dims_2d = ["data_point", "feature"]
@@ -158,12 +160,21 @@ def setup_mutable_data(
                 axis=0,
                 weights=intensity,
             ).reshape((1, num_features, 1))
-            self.model_parameters.setdefault("mz_sigma",10)
-            mz_sigma = np.ones((1, num_features, 1), dtype="float") * self.model_parameters["mz_sigma"]
+            self.model_parameters.setdefault("mz_sigma", 10)
+            mz_sigma = (
+                np.ones((1, num_features, 1), dtype="float")
+                * self.model_parameters["mz_sigma"]
+            )
             self.model_parameters.setdefault("alpha_lam", 1 / intensity.max(axis=0))
-            alpha_lam = np.ones((1, num_features), dtype="float") * self.model_parameters["alpha_lam"]
-            self.model_parameters.setdefault("me_sigma",10)
-            me_sigma = np.ones((1, num_features), dtype="float") * self.model_parameters["me_sigma"]
+            alpha_lam = (
+                np.ones((1, num_features), dtype="float")
+                * self.model_parameters["alpha_lam"]
+            )
+            self.model_parameters.setdefault("me_sigma", 10)
+            me_sigma = (
+                np.ones((1, num_features), dtype="float")
+                * self.model_parameters["me_sigma"]
+            )
             z = np.array(charge).reshape((1, num_features, 1))
             mz_tile = np.tile(mz, (1, 1, num_isotopic_peaks))
             peaks = np.arange(num_isotopic_peaks)

diff --git a/src/pystoms/models_3d/model_3d_m2.py b/src/pystoms/models_3d/model_3d_m2.py
@@ -2,7 +2,7 @@
 model M2
 """
 
-from typing import Optional,Dict
+from typing import Optional, Dict
 import pymc as pm
 import pymc.math as pmath
 import numpy as np
@@ -33,7 +33,7 @@ class ModelM2(AbstractModel):
     Args:
         features (AlignedFeatureData): Feature's data wrapped in
             pystoms.AlignedFeatureData container.
-        model_parameters (Dictionary, optional): parameters for 
+        model_parameters (Dictionary, optional): parameters for
             priors and hyperpriors. Defaults to None.
         likelihood (str, optional): Likelihood distribution. Currently
             supported: 'Normal', 'StudentT'. Defaults to 'Normal'.
@@ -62,7 +62,9 @@ def __init__(
         if model_parameters is None:
             model_parameters = {}
         self.model_parameters = model_parameters.copy()
-        super().__init__(feature_ids, batch_size, random_number_generator, name, coords=coords)
+        super().__init__(
+            feature_ids, batch_size, random_number_generator, name, coords=coords
+        )
         self.setup_mutable_data(features)
         # TODO is there a nicer way to share this between init and setup_mutable_data()
         dims_2d = ["data_point", "feature"]
@@ -164,18 +166,36 @@ def setup_mutable_data(
                 axis=0,
                 weights=intensity,
             ).reshape((1, num_features, 1))
-            self.model_parameters.setdefault("mz_sigma",10)
-            mz_sigma = np.ones((1, num_features, 1), dtype="float") * self.model_parameters["mz_sigma"]
-            self.model_parameters.setdefault("mz_s_scale",0.05)
-            ms_s_scale = np.ones((1, num_features, 1), dtype="float") * self.model_parameters["mz_s_scale"]
-            self.model_parameters.setdefault("ms_s_alpha",2)
-            ms_s_alpha = np.ones((1, num_features, 1), dtype="float") * self.model_parameters["ms_s_alpha"]
-            self.model_parameters.setdefault("alpha_scale",intensity.max(axis=0))
-            alpha_scale = np.ones((1, num_features), dtype="float") * self.model_parameters["alpha_scale"]
-            self.model_parameters.setdefault("alpha_alpha",2)
-            alpha_alpha = np.ones((1, num_features), dtype="float") * self.model_parameters["alpha_alpha"]
-            self.model_parameters.setdefault("me_sigma",10)
-            me_sigma = np.ones((1, num_features), dtype="float") * self.model_parameters["me_sigma"]
+            self.model_parameters.setdefault("mz_sigma", 10)
+            mz_sigma = (
+                np.ones((1, num_features, 1), dtype="float")
+                * self.model_parameters["mz_sigma"]
+            )
+            self.model_parameters.setdefault("mz_s_scale", 0.05)
+            ms_s_scale = (
+                np.ones((1, num_features, 1), dtype="float")
+                * self.model_parameters["mz_s_scale"]
+            )
+            self.model_parameters.setdefault("ms_s_alpha", 2)
+            ms_s_alpha = (
+                np.ones((1, num_features, 1), dtype="float")
+                * self.model_parameters["ms_s_alpha"]
+            )
+            self.model_parameters.setdefault("alpha_scale", intensity.max(axis=0))
+            alpha_scale = (
+                np.ones((1, num_features), dtype="float")
+                * self.model_parameters["alpha_scale"]
+            )
+            self.model_parameters.setdefault("alpha_alpha", 2)
+            alpha_alpha = (
+                np.ones((1, num_features), dtype="float")
+                * self.model_parameters["alpha_alpha"]
+            )
+            self.model_parameters.setdefault("me_sigma", 10)
+            me_sigma = (
+                np.ones((1, num_features), dtype="float")
+                * self.model_parameters["me_sigma"]
+            )
             z = np.array(charge).reshape((1, num_features, 1))
             mz_tile = np.tile(mz, (1, 1, num_isotopic_peaks))
             peaks = np.arange(num_isotopic_peaks)