Skip to content
This repository has been archived by the owner on Nov 11, 2024. It is now read-only.

Commit

Permalink
Merge pull request #7 from TimOliverMaier/dev
Browse files Browse the repository at this point in the history
Refactor Synthetics
  • Loading branch information
TimOliverMaier authored Dec 22, 2022
2 parents d5b599f + 09ffb07 commit ea4e3e4
Show file tree
Hide file tree
Showing 10 changed files with 532 additions and 572 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ docs_src/
.pytest_cache/

# output of dvc plots
dvc_plots/
dvc_plots/
61 changes: 52 additions & 9 deletions examples/synthetics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
"source": [
"## Isotopic Distributions in Mass Spectrometry\n",
"\n",
"Peptide isotopic patterns can be modeled via an gaussian mixture distribution. The positions of the components are dependent on the charge and the mass of the peptide while the weights can be calculated by models such as the averagine-like model presented by Breen *et al.*.\n",
"Peptide isotopic patterns can be modeled via an gaussian mixture distribution. The positions of the components are depending on the charge and the mass of the peptide while the weights can be calculated by models such as the averagine-like model presented by Breen *et al.* [1].\n",
"\n",
"In PystoMS a `scipy.stats.rv_continuous` subclass is implemented to allow sampling from isotopic distributions. "
]
Expand Down Expand Up @@ -72,7 +72,8 @@
"metadata": {},
"source": [
"To calculate the pdf along an axis, the parameters of the isotopic distribution must be passed to the pdf method. \n",
"The parameter `mass` is only considered in calculation of the component weights, to shift the distribution from 0 to the monoisotopic peak use `loc`. Both `mass` and `loc` should store the mass and the mass to charge ratio of the monoisotopic peak, respectively."
"The parameter `mass` is only considered in calculation of the component weights, to shift the distribution from 0 to the monoisotopic peak use `loc`. The parameters `mass` and `loc` should store the mass and the mass to charge ratio of the monoisotopic peak, respectively. However, the usage of the scipy `scale` parameter is not supported. For the shape of the distribution use `sigma` and `charge`. This implementation allows to have an individual `sigma`\n",
"for each gauss bell."
]
},
{
Expand Down Expand Up @@ -101,8 +102,8 @@
"metadata": {},
"outputs": [],
"source": [
"y = iso.rvs( loc=301.2, mass = 301.2,charge = 2,sigma=0.05,num_peaks=6,size=5000,random_state=rng)\n",
"plt.hist(y,bins=100)\n",
"samples = iso.rvs( loc=301.2, mass = 301.2,charge = 2,sigma=0.05,num_peaks=6,size=5000,random_state=rng)\n",
"plt.hist(samples,bins=100)\n",
"plt.show()"
]
},
Expand All @@ -111,8 +112,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Both `.pdf()` and `.rvs()` internally call the overwritten methods `._pdf()` and `._rvs()`, respectively. The `._rvs()` method first samples a component a sample is generated by according to the component weights. Then the sample's deviation from its component's mean is drawn from a normal distribution N(0,σ).\n",
"To simulate (m/z,intensities) form the distribution use the `.rvs_to_intensities()` method. The method draws `size` samples from\n",
"Both `.pdf()` and `.rvs()` were overwritten and internally call the overwritten methods `._pdf()` and `._rvs()`, respectively. The `._rvs()` method first samples a component and then the deviation of the sample from the component`s mean.\n",
"To simulate (m/z,intensities) from the distribution use the `.rvs_to_intensities()` method. The method draws `size` samples from\n",
"`.rvs()` and bins theses samples based on `bin_width`. Each sample is considered a single molecule that contributes `signal_per_molecule` arbitrary units\n",
"to intensity of it's bin."
]
Expand All @@ -128,11 +129,53 @@
"plt.scatter(bins_pos,intensities)\n",
"plt.show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"It is also possible to pass a `sigma` vector of different sigmas (with size corresponding to the number of modeled peaks `num_peaks`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y2 = iso.pdf(x,loc=301.2,mass=301.2,charge=2,sigma=np.array([0.1,0.2,0.01,0.01]),num_peaks=4)\n",
"plt.plot(x,y2)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"samples2 = iso.rvs(loc=301.2,mass=301.2,charge=2,sigma=np.array([0.1,0.2,0.01,0.01]),num_peaks=4,size=50000)\n",
"plt.hist(samples2,bins=100)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bins_pos2,intensities2 = iso.rvs_to_intensities( loc=301.2, mass = 301.2,charge = 2,sigma=np.array([0.1,0.2,0.01,0.01]),num_peaks=4,size=50000,random_state=rng,bin_width=0.001)\n",
"plt.plot(bins_pos2,intensities2,alpha=0.5,ls=\":\")\n",
"plt.scatter(bins_pos2,intensities2)\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "pymc_dev",
"display_name": "master_dev_310",
"language": "python",
"name": "python3"
},
Expand All @@ -146,12 +189,12 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.10.8"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "e72a8d8ce8364635d68c11ed264ecd033ec926576ac60b4957a81f7389674ed5"
"hash": "a0f3cdbb03b3455b057494528c47f11ae9506033764e57bdc42e3f76c3be3b29"
}
}
},
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[build-system]
# Minimum requirements for the build system to execute.
requires = ["setuptools", "wheel"] # PEP 508 specifications.
# from https://peps.python.org/pep-0518/
# from https://peps.python.org/pep-0518/
59 changes: 38 additions & 21 deletions src/pystoms/models_3d/abstract_3d.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import os
from numpy.typing import ArrayLike
from time import time

# typing
NDArrayFloat = npt.NDArray[np.float64]
NDArrayInt = npt.NDArray[np.int64]
Expand Down Expand Up @@ -452,7 +453,15 @@ def arviz_plots(
else:
plt.show()

az.plot_trace(idata_sliced, var_names, legend=True, chain_prop={"linestyle": ("solid", "dotted", "dashed", "dashdot"),"color":("black","blue","green","red")})
az.plot_trace(
idata_sliced,
var_names,
legend=True,
chain_prop={
"linestyle": ("solid", "dotted", "dashed", "dashdot"),
"color": ("black", "blue", "green", "red"),
},
)
if save_fig:
plt.tight_layout()
plt.savefig(feature_path + "trace.png")
Expand Down Expand Up @@ -484,41 +493,43 @@ def arviz_plots(

# posterior predictive lm
idata_feature_sliced = self.idata.isel(feature=idx)
az.plot_lm(y=idata_feature_sliced.observed_data.obs,
y_hat=idata_feature_sliced.posterior_predictive.obs,
num_samples=500)
az.plot_lm(
y=idata_feature_sliced.observed_data.obs,
y_hat=idata_feature_sliced.posterior_predictive.obs,
num_samples=500,
)
if save_fig:
plt.savefig(feature_path + "posterior_predictive_lm.png")
plt.close()
else:
plt.show()

# prior predictive lm
fig,ax = plt.subplots(1,1)
az.plot_lm(y=idata_feature_sliced.observed_data.obs,
y_hat=idata_feature_sliced.prior_predictive.obs,
num_samples=500,
legend=False,
axes=ax)
h,l = ax.get_legend_handles_labels()
for idx,lab in enumerate(l):
fig, ax = plt.subplots(1, 1)
az.plot_lm(
y=idata_feature_sliced.observed_data.obs,
y_hat=idata_feature_sliced.prior_predictive.obs,
num_samples=500,
legend=False,
axes=ax,
)
h, l = ax.get_legend_handles_labels()
for idx, lab in enumerate(l):
if lab == "Posterior predictive samples":
l[idx] = "Prior predictive samples"
ax.legend(h,l)
ax.legend(h, l)
if save_fig:
fig.savefig(feature_path + "prior_predictive_lm.png")
plt.close()
else:
plt.show()

az.plot_parallel(idata_sliced,norm_method="minmax")
az.plot_parallel(idata_sliced, norm_method="minmax")
if save_fig:
plt.savefig(feature_path + "plot_parallel.png")
plt.close()
else:
plt.show()



def evaluation(
self,
Expand All @@ -531,7 +542,7 @@ def evaluation(
progressbar: bool = True,
pred_name_list: Optional[List[str]] = None,
**plot_kwargs,
) -> Tuple[az.InferenceData,float]:
) -> Tuple[az.InferenceData, float]:
"""Evaluate precursor feature model.
This function is wrapping several steps
Expand Down Expand Up @@ -571,7 +582,7 @@ def evaluation(
end_sampling = time()

if pred_name_list is None:
pred_name_list = ["obs","mu"]
pred_name_list = ["obs", "mu"]

if plots is None:
# make 'in' operator available for plots
Expand All @@ -588,7 +599,9 @@ def evaluation(

if prior_pred_out:
# prior predictive analysis out-of-sample
self._sample_predictive(is_prior=True, is_grid_predictive=True, var_names=pred_name_list)
self._sample_predictive(
is_prior=True, is_grid_predictive=True, var_names=pred_name_list
)
if "prior_pred_out" in plots:
for pred_name in pred_name_list:
plot_kwargs["pred_name"] = pred_name
Expand All @@ -606,10 +619,14 @@ def evaluation(

if posterior_pred_out:
# posterior predictive analysis out-of-sample
self._sample_predictive(is_grid_predictive=True, progressbar=progressbar, var_names=pred_name_list)
self._sample_predictive(
is_grid_predictive=True,
progressbar=progressbar,
var_names=pred_name_list,
)
if "posterior_pred_out" in plots:
for pred_name in pred_name_list:
plot_kwargs["pred_name"] = pred_name
self.visualize_predictions_scatter(in_sample=False, **plot_kwargs)

return (self.idata.copy(),end_sampling-start_sampling)
return (self.idata.copy(), end_sampling - start_sampling)
25 changes: 18 additions & 7 deletions src/pystoms/models_3d/model_3d_m1.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ class ModelM1(AbstractModel):
Args:
features (AlignedFeatureData): Feature's data wrapped in
pystoms.AlignedFeatureData container.
model_parameters (Dictionary, optional): parameters for
model_parameters (Dictionary, optional): parameters for
priors and hyperpriors. Defaults to None.
likelihood (str, optional): Likelihood distribution. Currently
supported: 'Normal', 'StudentT'. Defaults to 'Normal'.
Expand Down Expand Up @@ -62,7 +62,9 @@ def __init__(
if model_parameters is None:
model_parameters = {}
self.model_parameters = model_parameters.copy()
super().__init__(feature_ids, batch_size, random_number_generator, name, coords=coords)
super().__init__(
feature_ids, batch_size, random_number_generator, name, coords=coords
)
self.setup_mutable_data(features)
# TODO is there a nicer way to share this between init and setup_mutable_data()
dims_2d = ["data_point", "feature"]
Expand Down Expand Up @@ -158,12 +160,21 @@ def setup_mutable_data(
axis=0,
weights=intensity,
).reshape((1, num_features, 1))
self.model_parameters.setdefault("mz_sigma",10)
mz_sigma = np.ones((1, num_features, 1), dtype="float") * self.model_parameters["mz_sigma"]
self.model_parameters.setdefault("mz_sigma", 10)
mz_sigma = (
np.ones((1, num_features, 1), dtype="float")
* self.model_parameters["mz_sigma"]
)
self.model_parameters.setdefault("alpha_lam", 1 / intensity.max(axis=0))
alpha_lam = np.ones((1, num_features), dtype="float") * self.model_parameters["alpha_lam"]
self.model_parameters.setdefault("me_sigma",10)
me_sigma = np.ones((1, num_features), dtype="float") * self.model_parameters["me_sigma"]
alpha_lam = (
np.ones((1, num_features), dtype="float")
* self.model_parameters["alpha_lam"]
)
self.model_parameters.setdefault("me_sigma", 10)
me_sigma = (
np.ones((1, num_features), dtype="float")
* self.model_parameters["me_sigma"]
)
z = np.array(charge).reshape((1, num_features, 1))
mz_tile = np.tile(mz, (1, 1, num_isotopic_peaks))
peaks = np.arange(num_isotopic_peaks)
Expand Down
50 changes: 35 additions & 15 deletions src/pystoms/models_3d/model_3d_m2.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
model M2
"""

from typing import Optional,Dict
from typing import Optional, Dict
import pymc as pm
import pymc.math as pmath
import numpy as np
Expand Down Expand Up @@ -33,7 +33,7 @@ class ModelM2(AbstractModel):
Args:
features (AlignedFeatureData): Feature's data wrapped in
pystoms.AlignedFeatureData container.
model_parameters (Dictionary, optional): parameters for
model_parameters (Dictionary, optional): parameters for
priors and hyperpriors. Defaults to None.
likelihood (str, optional): Likelihood distribution. Currently
supported: 'Normal', 'StudentT'. Defaults to 'Normal'.
Expand Down Expand Up @@ -62,7 +62,9 @@ def __init__(
if model_parameters is None:
model_parameters = {}
self.model_parameters = model_parameters.copy()
super().__init__(feature_ids, batch_size, random_number_generator, name, coords=coords)
super().__init__(
feature_ids, batch_size, random_number_generator, name, coords=coords
)
self.setup_mutable_data(features)
# TODO is there a nicer way to share this between init and setup_mutable_data()
dims_2d = ["data_point", "feature"]
Expand Down Expand Up @@ -164,18 +166,36 @@ def setup_mutable_data(
axis=0,
weights=intensity,
).reshape((1, num_features, 1))
self.model_parameters.setdefault("mz_sigma",10)
mz_sigma = np.ones((1, num_features, 1), dtype="float") * self.model_parameters["mz_sigma"]
self.model_parameters.setdefault("mz_s_scale",0.05)
ms_s_scale = np.ones((1, num_features, 1), dtype="float") * self.model_parameters["mz_s_scale"]
self.model_parameters.setdefault("ms_s_alpha",2)
ms_s_alpha = np.ones((1, num_features, 1), dtype="float") * self.model_parameters["ms_s_alpha"]
self.model_parameters.setdefault("alpha_scale",intensity.max(axis=0))
alpha_scale = np.ones((1, num_features), dtype="float") * self.model_parameters["alpha_scale"]
self.model_parameters.setdefault("alpha_alpha",2)
alpha_alpha = np.ones((1, num_features), dtype="float") * self.model_parameters["alpha_alpha"]
self.model_parameters.setdefault("me_sigma",10)
me_sigma = np.ones((1, num_features), dtype="float") * self.model_parameters["me_sigma"]
self.model_parameters.setdefault("mz_sigma", 10)
mz_sigma = (
np.ones((1, num_features, 1), dtype="float")
* self.model_parameters["mz_sigma"]
)
self.model_parameters.setdefault("mz_s_scale", 0.05)
ms_s_scale = (
np.ones((1, num_features, 1), dtype="float")
* self.model_parameters["mz_s_scale"]
)
self.model_parameters.setdefault("ms_s_alpha", 2)
ms_s_alpha = (
np.ones((1, num_features, 1), dtype="float")
* self.model_parameters["ms_s_alpha"]
)
self.model_parameters.setdefault("alpha_scale", intensity.max(axis=0))
alpha_scale = (
np.ones((1, num_features), dtype="float")
* self.model_parameters["alpha_scale"]
)
self.model_parameters.setdefault("alpha_alpha", 2)
alpha_alpha = (
np.ones((1, num_features), dtype="float")
* self.model_parameters["alpha_alpha"]
)
self.model_parameters.setdefault("me_sigma", 10)
me_sigma = (
np.ones((1, num_features), dtype="float")
* self.model_parameters["me_sigma"]
)
z = np.array(charge).reshape((1, num_features, 1))
mz_tile = np.tile(mz, (1, 1, num_isotopic_peaks))
peaks = np.arange(num_isotopic_peaks)
Expand Down
Loading

0 comments on commit ea4e3e4

Please sign in to comment.